Release Notes: LTS
This document outlines changes introduced to the Intel® software for general-purpose GPU capabilities in LTS releases. As the software includes several different projects, the changes for each release are grouped by project.
To install packages for the preferred LTS release, refer to either the LTS 2523.x or LTS 2350.x installation guide for your distribution. For a list of packages published on repositories.intel.com/gpu for each release and operating system, see Provided Packages.
2025-09-05
Since the Intel® Data Center GPU Max Series and Intel® Data Center GPU Flex Series entered a stable and mature phase, this release introduces a new Long-Term Support (LTS) release stream, based on the final and fully developed 2523.12 rolling release. It is a new line including the latest features and enhancements, not a continuation of the previous LTS stream. The rolling and 2350-based LTS release streams have been discontinued, and ongoing maintenance and support will now be streamlined through this final, production-ready LTS 2523 stream. This release note lists features and breaking changes introduced since the previous LTS release, along with fixes introduced after the last rolling release.
The 2523.31 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.10, 9.4, 9.6, and 10.0
Ubuntu 22.04 and 24.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, 15 SP6, and 15 SP7
Improvements
Intel® Graphics Compiler for OpenCL™
Fixed an issue causing the RSBench application to fail.
Fixed an accuracy issue with
aten::amax
.
Breaking changes
General
Deprecated the Intel® Media SDK project, so the intel-mediasdk
package is no longer included in this and all future LTS releases. For instructions on installing all required packages for this new LTS release, see the Installing Data Center GPU: LTS 2523.x Releases document. The previous LTS installation guide, now renamed to Installing Data Center GPU: LTS 2350.x Releases, does not apply to this release but you can still use it for earlier LTS releases.
Known Issue
Installation of IEFS version 12.1.0.0.149 may fail on SLES 15 SP6 systems running kernel 6.4.0-150600.23.47 due to a build error in the OFA kernel module. The error occurs because the rv
module incorrectly uses the MODULE_IMPORT_NS(DMA_BUF)
macro, resulting in a compilation failure.
Resolution: A patch is available to resolve this issue: View patch on GitHub
If you cannot apply the patch in your environment, use MOFED version 24.10-2.1.8.0 instead of 24.10-3.2.5.0-LTS. This MOFED version has been validated successfully with the previous IEFS version.
Features
General
Incorporated the latest security updates to address recent vulnerabilities, enhance protection, and ensure greater system reliability.
Introduced support for new operating systems: Red Hat Enterprise Linux 9.6 and 10.0, Ubuntu 24.04, and SUSE Linux Enterprise 15 SP7.
Intel CM Compiler
Introduced support for arbitrary SIMD in sampler intrinsics.
Added the
cm_rsqrt
implementation, which maps directly to the SPIR-V OpenCL RSqrt intrinsic.Started supporting saturation in the
rsqrt
built-in.Introduced support for full-width r0 access in the
cm_get_r0
intrinsic.Introduced a compiler option to enable cost analysis information.
Introduced the
-vc-use-bindless-buffers
and-vc-use-bindless-images
options to enable bindless accesses. These options allow VC to generate bindless buffers and images, replacing stateful BTI-based ones.Disabled global fence on platforms other than Intel Data Center GPU Max Series.
Added 8-bit floating point conversion intrinsics.
Added a helper function to retrieve the global thread ID along its dimension.
Added support for new Battlemage and Panther Lake devices.
Updated the CM specification including the LSC memory interface, cache controls, and CM macro requirements.
Added the stochastic rounding intrinsic declaration.
Included the main
<cm/cm.h>
header implicitly, enabling caching when compiling from the CM source.Added intrinsics for 2D block load and store operations.
Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
Added support for the
BUFFER_SIZE
explicit argument.Added the Level Zero API for querying kernel argument data.
Added support for
ZE_EXTERNAL_MEMORY_TYPE_FLAG_OPAQUE_FD
to the Level Zero runtime.Added a debug flag to force Graphics Memory Manager (GMM) system memory resource type.
Introduced initial support for creating media context.
Started skipping kernel internal allocations to command list residency.
Stopped initializing in-order TS nodes and introduced initial support for standalone CB events timestamps allocator.
Added support for HP copy engine context.
Allowed for dynamic counting of HP contexts in a context group.
Added support for the L3-cache reservation.
Allocated First In, First Out (FIFO) for the debugger.
Enhanced the TD Debug Control register to support per-context breakpoints and debug features, improving multi-context debugging and virtualization support.
Implemented new registers for the debugger and added support for V3 state save header.
Added a new
SUPPORTED_DEVICES
query option to OpenCL offline compiler to allow generating a YAML file containing information about supported devices.Added DRM-specific memory banks computation logic.
Added a possibility of checking the WMTP status of a device using the engine capability flag.
Modified kernel residency to enable kernel mutation capabilities, allowing for mutability of the kernel residency list. This enhancement provides fine control over added or replaced allocations, whether as kernel arguments or internal kernel allocations.
Added the
statelessBuiltinsEnabled
boolean to the command list.Enabled Blitter Command Streamer (BCS) for transfers when Compute Command Streamer (CCS) is busy. Additionally, started using staging buffer as a pool for multiple smaller transfers, limited barrier usage in non-USM copies to single transfers, and enabled staging buffer copies.
Added support for the
cl_khr_extended_bit_ops
extension.Implemented the
zeCommandListImmediateAppendCommandListsExp
API to launch the existing command list into an immediate command list.Added memory tracking for system and per-device allocations, enabling enhanced memory pooling heuristics.
Implemented the product helper specializations for querying device support for 2D block load/store operations.
Introduced external required scratch space and a kernel command view flag at the command list level. Additionally, added a flag to block implicit scaling commands during dispatch, along with an option to enable compute walker command view.
Added regular and HP contexts to groups without a dedicated HP engine.
Added context group support for root device engines.
Added support for data cache limits in peak power.
Introduced SW First In, First Out (FIFO) implementation.
Added support for state save area header v4.
Modified the
setErrorDescription
API.Started using the
iotclhelper
to get the number of media engines.Upgraded the Core API to version 1.9, enabling the immediate execution of existing command lists by converting them into immediate command lists.
Added support for querying the number of L3 cache banks to the preliminary version of the Xe architecture. This allows software to retrieve information about how many banks of L3 cache are available on the GPU.
Enabled bindless mode that allows shaders to directly access resources like textures, buffers, and other memory objects without the need for explicit binding to specific slots in the traditional way. This provides more flexibility and efficiency, particularly for workloads with large numbers of resources.
Started using the global bindless allocator in the in Xe² HPG architecture to manage the assignment of global pointers to resources in the bindless mode.
Added an API that retrieves kernel binaries from kernel.
Updated kernel residency management to save the position of the kernel internal container when allocation can change.
Added an input/output control helper function to get fence address and set external context.
Added implementation of standalone CB events without pool allocation supporting all features for regular events.
Added support for running SIMD16 operations on Execution Units (EUs) contained within each Dual-SubSlice (DSS) on the preliminary version of the Xe architecture.
Implemented the kernel trace functionality to support the metric group type.
Added additional parameters to the Graphics Microcontroller (GuC) to support the region allocation logic.
Implemented
setErrorDescription
inos_interface/linux
fordrm_buffer_object
,drm_memory_manager
, andioctl_helper_prelim
.Added a debug flag to disable walker splitting for copy operations.
Updated the Level Zero Core version to 1.6.
Added a debug key for setting
MaxSubSlicesSupported
.Implemented a debug flag to manage the direct submission semaphore mode.
Implemented a debug flag that allows changing the ULLS Blitter Command Streamer (BCS) timeout.
Added new functions to the dispatch table to support mutation of kernel Instruction Set Architecture (ISA).
Added a class for multiple device metric to calculate the report format.
Added a
EnableCompatibilityMode
flag to support binary compatibility across multiple hardware targets.Added support for 2D block load and store extension queries.
Introduced a new forward-compatibility model for
zeinfo
to emit an error whenever an unknown attribute is encountered.Added a getter method for accessing device node information from the Direct Rendering Manager (DRM).
Added the missing event scope flags for Command Buffer (CB) handling.
Added support for system memory in virtual memory functions.
Added new parameters to the GuC System Information (SysInfo) Blob for enhanced functionality.
Extended
zeDeviceGetProperties
with additional device properties, such asmodule_id
andserver_type
tokens.Enhanced log messages to support setting message severity using environment variables.
Added heapless built-ins with images compilation.
Started using heapless built-ins for images.
Added support for custom compiler backends to support loading different versions of the backend compiler based on underlying device.
Started using the
sysInfo
helper for detecting the memory type.Started checking the peak power support using the escape call.
Added support for 3-channel configuration in the image format descriptor.
Updated the General Register File (GRF) register implementation.
Implemented error handling to trigger when the OA buffer overflows.
Enhanced kernel parameter configuration by adding support for passing additional
enqueue
andzebin
parameters. This enables features such as quantum dispatch and quantum size specification directly withinzebin
for better encapsulation and OpenCL compatibility.Added support for the custom allocator in work partition allocation.
Added idle Control and Status Register (CSR) detection and improved timeout handling in the ULLS controller.
Implemented VF engine utilization API.
Added the
input
andoutput
control helper functions tommap
andunmap
operations, acquire and release the GPU range, allocate user pointer, and synchronize theuserptr
allocation.Exposed new counter-based events and added the default mode for
zexCounterBasedEventCreate2
.Introduced support for physical host memory.
Updated Level Zero metrics to align with v1.11 headers.
Started specifying the cache level when reserving a region.
Added GPU and memory power domain support for
getEnergyCounter
.Added support for three channels in Level Zero.
Introduced support for
zeInitDrivers
that combines driver initialization and retrieval functionality. Updated theGTPIN
initialization logic to execute only whenpCount
is greater than 0 and the driver handle is non-null. Additionally, removed the unusedze_init_flags_t
flag from alldriverInit
functions.Enabled counter-based allocation peer sharing to support scenarios involving in-order command lists with multi-GPU event scenarios.
Added support for two Xe-eudebug interfaces within a single binary. The new
EuDebugInterface
class encapsulates eudebug functionality, with CMake flags to control Xe-eudebug and prelim uAPI support.Added a root device flag check for multi-device scenarios, so that APIs using root device handles can now validate this flag and handle failures gracefully.
Added a new uAPI macro in the engine module to fetch the configuration of the total ticks.
Added Process Maturity Table (PMT) counter offset values for Battlemage.
Started handling page fault events in the Xe debugger.
Added support for the
cl_khr_expect_assume
OpenCL extension that introduces mechanisms to supply the compiler with information that can enhance the performance of certain kernels.Implemented the Level Zero
zeKernelGetBinaryExp
API that allows retrieving kernel binary program data.Added support for shared system Unified Shared Memory (USM) allocation in
appendLaunchKernel
.Implemented enhancements to the Unified Shared Memory (USM) reuse mechanism, including the introduction of a USM reuse cleaner that efficiently manages system and local memory across different reuse strategies, as well as an extension of the USM reuse limit infrastructure.
Improved cache management by supporting whitelisted includes.
Added support for handling new Reliability, Availability, and Serviceability (RAS) errors in Sysman.
Implemented alignment of host Unified Shared Memory (USM) to 2MB on discrete devices when the allocated size exceeds 2MB.
Improved command stream performance on Intel Graphics GPU by preallocating internal heap and multiple command buffers, and optimizing GMM for cacheable command buffers.
Added a debug key to apply high power throttling hints on WDDM initialization.
Enabled a new dispatch monitor fence policy in direct submissions and removed redundant TLB flushes.
Added support for OpenCL C support queries to the OpenCL Offline Compiler, enabling automated compilation workflows.
Updated the Digital Display Interface (DDI) to include new introspection APIs for events, command lists, and command queues, with version checks based on Level Zero 1.9.
Introduced heapless mode programming support across Level Zero, SBA, OpenCL, and related tooling.
Introduced support for using a high-priority engine.
Added initial support for patching region parameters and updated related zeinfo arguments.
Enabled implicit conversion to counter-based events and introduced related experimental APIs and features for enhanced event handling.
Added functions to get and set virtual address space handles, returning an error if unsupported.
Added support for memory policy configuration for
GEM_CREATE
using a new ioctl extension and debug variables.Added support for the
ForceBcsEngineIndex
flag.Improved metric notifications by unifying the notification structure.
Added a
NEEDS_RESET
property to indicate when event reset is required.Ported the Group Engine Interface to the Xe architecture.
Added support for RAS clear state experimental functionality.
Introduced a basic structure for tracer support.
Enabled device allocation chunking by default for multi-tile configurations with implicit scaling.
Added support for the
activateMetrics
feature.Introduced a caller ID feature in aubstream to help distinguish capture environments.
Created boilerplate support for querying current voltage.
Added a debug variable to configure BO chunking size.
Added Product Helper support to the Performance Module.
Implemented a query for kernel maximum group size using
ze_kernel_max_group_size_ext_properties_t
.Added debug flags and instrumentation for
waitpkg
calls, including wrappers and CPU feature detection.Implemented an initial query to retrieve kernel register count per device and kernel, as a precursor to formal support.
Added an interface to bind resources as read-only.
Added a debugger functionality to the global stateless feature.
Introduced support for synchronization dispatch token allocation.
Implemented
zeCommandListAppendImageCopyToMemoryExt
/FromMemoryExt
.Registered a critical metadata section for the Xe debugger.
Introduced a mechanism that assigns a synchronization queue ID in the ImplicitScaling mode.
Introduced an initial support for synchronized dispatch.
Implemented a
VM_BIND EU
debug event for Xe.Implemented Sideband Access (SBA) and module debug access for the Xe debugger.
Added the override key to change the command list update capability.
Added a debug flag for metrics logs.
Enabled the detection of Xe Direct Rendering Manager (DRM) by default.
Enabled finding the CPU base address from all command buffers in the container.
Added support for the CCS mode configuration via
SysFs
.Added logic to iterate for all contexts to check the GPU page fault.
Enabled the per-IP
euStall
functionality.Introduced a functionality that aborts when an unexpected GPU page fault is detected.
Introduced in-order host counter allocation pooling.
Added a flag to allow noop space for command buffer (CB) events from the same in-order pool.
Added a new functionality to patch helpers.
Added output list of commands for counter-based wait events.
Added a debug key to generate the SIP header file.
Introduced state programming during driver initialization for heapless OpenCL (OCL).
Introduced storing the timestamp command buffer (CB) event clear and sync commands.
Implemented metadata create event handling in Xe.
Added a command pointer to store data for the immediate encoder.
Added a feature that returns an error when file handles are exhausted in the Sysman engine.
Implemented a signal for OpenGL (OGL) to indicate the creation and destruction of shared buffers.
Implemented reporting for multi-hop fabric connections.
Added a wait command list argument.
Added a timestamp to the postsync command in the list argument.
Enabled a workaround for dummy blit operations on Ponte Vecchio.
Introduced support for pooling in-order counter allocations.
Implemented metadata attaching for
vm_bind
in Xe.Introduced bindless image extension in
ImageView
.Implemented thread control and the att event handling for Xe.
Introduced the bindless images extension and supported pitched pointer (ptr) functionality.
Implemented
zeCommandListImmediateAppendCommandListsExp
.Introduced a query to get the kernel module register sizes.
Added support for legacy acronyms in ocloc’s fatbinary.
Introduced a bindless image extension and image properties.
Added a debug flag to flush the Translation Lookaside Buffer (TLB) before copying.
Implemented the use of heapless built-in functions in Level Zero when supported.
Added the
ZE_experimental_bindless_image
extension.Implemented a thread control for the debugger in Xe.
Implemented a method to update Level Zero helper command lists.
Implemented
read/writeGpuMemory
for the Xe debugger.Implemented the utilization of heapless built-in functions in OpenCL (OCL) when supported.
Added keys to override the sync mode for the immediate command list.
Introduced an initial support for the local dispatch size query.
Added introspection APIs for events and command list and command queue.
Implemented the process entry/exit event with Xe.
Added debug flags to force pat index.
Added support for DRM_XE_EUDEBUG_EVENT_VM event handling.
Added print of the CPU flags and the address size upon detection.
Implemented the Debugger Open Input/Output Control (IOCTL).
Enabled Debug Attach capability for Xe.
Implemented debug information logging for PAT indexes.
Implemented the Xe Execution Unit (EU) Debug Open and Execute Queue events.
Enhanced reporting of the maximum cooperative group count.
Added umonitor and umwait synchronization function.
Allowed waiting for the standalone command buffer (CB) event.
Added stateless heapless built-ins.
Implemented a debug key to toggle a bit in the 57-bit GPU Virtual Address for specific allocations.
Introduced initial support for zexCounterBasedEventCreate.
Introduced support for zexEventGetDeviceAddress.
Implemented patching command buffer (CB) events on non-inOrder regular command lists.
Introduced a new API to create and export counter-based events.
Enabled the HP flag when creating HardwareContextController.
Enhanced post-sync system memory fence programming.
Implemented high-priority Command Streamer (CSR) from secondary contexts.
Introduced support for memory policy in GEM_CREATE.
Introduced initial support and implemented error handling for counter-based event flags.
Enabled support for secondary contexts in group.
Enhanced Linux OpenCL/OpenCL (CL)/OpenGL (GL) sharing support.
Added L3 Fabric Error monitoring in Sysman Directory.
Added support for firmware flash progress API.
Added monitoring for L3 fabric errors.
Enabled support for spill/private size in execution environment.
Implemented reading of the indirect detection version.
Added Sysman product helper in the preliminary and non-preliminary file for memory.
Included L3 Fabric Error monitoring in Sysman Directory.
Implemented support for firmware flash progress API.
Added monitoring for L3 fabric errors.
Enabled support for spill/private size in execution environment.
Implemented reading of the indirect detection version.
Enabled support for cl_intel_subgroup_2d_block_io to facilitate PyTorch upstreaming.
Enabled support for
cl_intel_subgroup_matrix_multiply_accumulate_tf32
andcl_intel_subgroup_buffer_prefetch
.Introduced backward compatibility for the Intel’s 12th generation of the low power integrated graphics architecture (Gen12LP).
Added new definitions to support Ahead-Of-Time (AOT) configurations.
Introduced a mechanism for detecting new Virtual Memory (VM) bind flags support in Xe path.
Introduced support for the Blitter Command Stream (BCS) fence programming in the copy offload path.
Introduced profiling support in the copy offload path.
Introduced handling in-order counters in the copy offload path.
Introduced 2-tile device memory chunking independent of the KMD migration.
Added Display Data Interface (DDI) table entry for the
GetPitch
function.Improved the elf rewriter to preserve strings.
Introduced support for git SHA logging to a log file.
Introduced using copy commands for offload operations.
Added the Executable and Linkable Format (ELF) rewriter utility.
Introduced an API definition that exposes media capabilities.
Created a copy offload queue under a debug flag.
Added support for 3-channel formats with 8, 16, and 32-bit depths.
Introduced
StorageInfo
for the Drm-specific customization.Enabled support for
cl_intel_subgroup_2d_block_io
.Introduced API stubs for media communication.
Introduced bindless sampled image support.
Introduced passing
GraphicsAllocation
to fence wait.Added the Memory Copy/Control Logic (MCL) functions to a dispatch table.
Added update capability flags.
Introduced support for creating multiple metric groups from metrics.
Added
tile-to-lmem-region
map toMemoryInfo
.Added a registry key to control the AUB/TBX writable for buffer host memory.
Introduced global bindless sampler offsets.
Added a field for reserving extra payload heap space.
Added setupIpVersion for Xe.
Enabled a basic framework for super fast logging.
Added the TBX support for unified shared memory (USM) with USM host pointer.
Introduced implicit synchronization mode dispatch for cooperative kernels.
Introduced an experimental API for suggesting Message Signaled Interrupts for PCI Express (MSIX) allocation.
Implemented
zeCommandListImmediateAppendCommandListsExp
.Added a wrapper for accessing the template method and getting the local ID generation.
Introduced support for concurrent groups.
Introduced assigning the external interrupt ID to the event.
Introduced storing the
walker
command in the CPU memory for appending launch kernel.Introduced returning experimental synchronization queue extension.
Added mask of tiles to each memory region in Xe.
Adjusted Page Attribute Tables (PAT) for data caches (DC) flush mitigation.
Introduced an experimental API responsible for synchronizing dispatch.
Added a method that toggles whether mid thread preemption is allowed or not.
Added a number of l3 banks to
TopologyData
.Added a new API stubs related to Message Signaled Interrupts for PCI Express (MSIX).
Introduced marking selected resources as uncacheable memory (UC) when mitigating data cache flush.
Updated program fence during Blitter Command Stream (BCS) tag update.
Added support for a null Address Translation Hub (AUB) mode.
Added a memory copy device to the host fence for host visible events in the Level Zero path.
Added a program device to the host fence in the OpenCL path.
Added a helper to check whether a device for host copy fence is required.
Enabled scratch address patching on regular command lists.
Improved patching of the scratch inline pointer.
Added heapless and global stateless scratch address patching.
Introduced a heapless state initialization in Level Zero.
Introduced a full synchronization dispatch mode initialization path.
Introduced RAS for the netlink interface.
Introduced support of bindless samplers in Level Zero.
Introduced support for explicit memory locking.
Implemented the
getElfSize
andgetElfData
methods for debugging and troubleshooting execution units in the Xe architecture.Added a getter for the walker inline data offset.
Introduced support for boilerplate for the spec 1.9 features.
Added support for
__INTEL_PER_THREAD_OFF
.Introduced opencl support for bindless kernels.
Added debug messages about the zero engine size.
Added an option to read scratch page options during initialization.
Added the Level Zero console logging for kernel buffer arguments.
Added functions for storing extra engines.
Added
defaultThreadArbitrationPolicy
to the command list.Started using legacy versions of
injectMMIOList
andsetTbxServerIp
.Added the
heaplessStateInitialized
flag.Created helper for
maxPtssIndex
.Improved the
createModuleFromFile
function.Improved cache host resources when mitigating data cache flush.
Enabled device Unified Shared Memory (USM) allocation reuse.
Added Unified Shared Memory (USM) host memory pooling.
Enabled indirect allocations as packing in OpenCL.
Started sharing inter-module Instruction Set Architecture (ISA) allocations.
Started iterating over indirect allocations.
Started reusing GPU timestamps instead of the KMD escape.
Started using a stack vector to create packet initialization data.
Optimized counter-based waiting schemes.
Intel® Graphics Compiler
Implemented GenISA predicated load/store intrinsics with promotion pass.
Added the
ActiveThreadsOnlyBarrier
option for OpenCL shaders.Improved call site inlining heuristic.
Added a call merger pass that merges mutually exclusive function calls when they are too large to inline.
Added the
__builtin_IB_disable_ieee_exception_trap
andGenISA_disable_ieee_exception_trap
intrinsic.Introduced additional transpose block 2D SPIR-V APIs.
Added a flag to disable merging allocas of different types, providing better control over the merge alloca pass and disabling aggressive merging by default.
Added 32-bit ELF type support to ZeBin for x86 use cases.
Enhanced
MergeAllocas
performance by replacing all allocas, generating casts at the point of use, handling select instructions in liveness analysis, avoiding merging allocas acrossContinuationHL
calls in raytracing, and disabling allocas merging for raytracing.Added support for recognizing OpenCL/SPIR-V built-ins represented as
TargetExtTy
to theProcessFuncAttributes
pass.Enabled SIMD16 drop for Xe3 to minimize register spills.
Added the
hasLscStoresWithNonDefaultL1CacheControls
flag to zeinfo, enabling 3D clients to detect Load Store Cache (LSC) stores with non-default L1 cache policies for proper UAV coherency flushing.Started using
MergeAllocas
for private memory merging, allowing reuse of non-overlapping private memory allocations to reduce overall memory usage.Added support for SPIR-V
MulExtended
instructions to the Vector Compiler (VC).Set the default General Register File (GRF) size to 128.
Added VISA support for HF8 conversion instruction and Panther Lake devices.
Enabled
SetHasSample
for thegather4*
instructions.Added support for stochastic round
bf8
intrinsic in the Vector Compiler (VC).Added Panther Lake support.
Added support for new Battlemage device IDs.
Introduced support for Floating-Point DIVide (FDIV) instructions inside
IGCVectorizer
.Added optional Intel® Graphics Compiler flags to enable multiplication expressions and non-constant integer steps.
EnableGEPLSRUnknownConstantStep
allows for expressions with steps unknown at the compilation time.EnableGEPLSRMulExpr
allows for expressions with multiplication. If enabled, it also enablesEnableGEPLSRUnknownConstantStep
. Both flags are disabled by default.Introduced a new GenISA intrinsic
WaveInterleave
that does subgroup reduction on each n-th work item.Introduced built-ins support for min and max operations in CMCL.
Added the
LoopAllocaUpperbound
pass pattern to identify single-basic-block (1-BB) loops with a non-constant upper bound and memory accesses to a constant-sized array.Enhanced Barrier Control Flow optimization by implementing synchronization and flushing directly on the running thread.
Enabled group sort built-ins.
Added support for
WaveClustered
to theSubGroupReductionPattern
pass.Improved the trimming algorithm.
Implemented the
PromoteBools
vector type promotion support.Added pre-header to the main subroutine’s
FuncInfo
instance.Introduced support for kernel cost model.
Added an
Instcombining
pass to fix multi-cast instructions.Added Vector Compiler (VC) internal intrinsics for block 2D load and enabled storing and prefetching operations with the matrix payload passed as an address operand.
Added an option to always force the Load Store Cache (LSC) immediate offset pattern match.
Added additional step during build of compiler which splits the whole
BiF
module into smaller sections. In this way materializing and inlining will require less time.Enabled more patterns for
simdShuffleXor
.Improved logic for deciding between block and scatter spill.
Added a
convergent
attribute to block read and write intrincsics.Introduced support for atomic scalar operations Vector Compiler (VC) that handles atomic Load Store Cache (LSC) intrinsics with scalar operands.
Introduced support for the
SPV_EXT_shader_atomic_float_min_max
extension.Introduced a new multiple accuracy IMF math functions and updates in existing implementations.
Added platforms to avoid dst/src overlap and dead lock due to the SBID dependence.
Added a warning message to notify that the requested
sub_group_size=32
cannot be compiled due toCallWA
being enabled.Introduced simplification of PHI instruction in
ScalarEvolution
to extend supported patterns for creating SCEVs iteratively inScalarEvolution
.Enabled Optimization Remark Emitter to emit remark for passes.
Added a mechanism to pass Virtual Instrument Software Architecture (VISA) finalizer option via API.
Enabled optimization for non-specific API options:
-cl-opt-disable
and-ze-opt-disable
.Added
localId
andglobalId
to theBufferBoundsChecking
assert message.Enabled the
D64 Load Block
message for thecl_intel_subgroups_long
extension.Improved the insertion logic for proceed-based dynamic ray management.
Introduced a new GenISA intrinsic
WaveClusteredInterleave
that combines two wave reductions:WaveClustered
andWaveInterleave
. The subgroup is split into clusters and then each cluster does interleaved reduction. The change includes a pattern match for reduction implemented with subgroup shuffles.Enabled loads rescheduling in
CodeLoopSinking
.Enabled loop sinking of 2D block loads and shuffle patterns.
Added a
SLMSizeInKb
field to theGTSystemInfo
with a new version ofGTSystemInfo
interface.Enabled
SIMD16/32
for a built-in with 32-bit support.Added an additional type in
getDefaultAccessType
.Added CMake configuration required to build on a
riscv64
host.Added a helper for getting a memory surface operand in the Vector Compiler (VC).
Implemented a single precision
fdiv
.Added an OpenCL option to enable profile-guided trimming.
Enabled internal Load Store Cache (LSC) cmask typed intrinsics.
Enabled 2D block intrinsics conversion pass when LICM is not enabled.
Started supporting bindful reads in
ConstantCoalescing
to support new patterns in ConstantCoalescing: indirect read accesses from bindful resources.Added genx invert-intrinsics in bif.
Enabled internal Load Store Cache (LSC)-typed 2D intrinsics.
Extended pattern matching for inverse square root in Vector Compiler (VC) to improve performance and accuracy.
Added pass checking if the address used in the instruction is greater of equal than the minimum valid address defined by a user.
Added a registry key and a compilation option to disable the
ShrinkArrayAlloca
pass.Added a function argument resolution support for
JointMatrixFuncsResolutionPass
.Added a new
WaveInterleave
GenISA intrinsic that does subgroup reduction on each n-th work item.Added optional Intel® Graphics Compiler flags to enable multiplication expressions and non-constant integer steps. The
EnableGEPLSRUnknownConstantStep
flag allows for expressions with steps unknown at compilation time and is disabled by default. TheEnableGEPLSRMulExpr
flag allows for expressions with multiplication and is disabled by default. WhenEnableGEPLSRMulExpr
is enabled, it also enablesEnableGEPLSRUnknownConstantStep
.Improved memory optimization by running an early the
memcpy
optimization pass and ensuring correct destination alignment formemcpy
operations from global constants, aligning results closer to the legacy behavior for better performance and optimization.Improved the
CodeLoopSinking
pass to support multi-instruction candidates, vector shuffle pattern, and 2D block loads sinking.Added the ability to set
SWSB_ENCODE_MODE
via a command line option in the Intel Graphics Assembler (IGA) standalone tool.Added support for i64 shuffle in the
SubGroupReductionPattern
pass by implementing it as a shuffle of <2 x i32>.Added a new flag
RunGEPLSRAfterLICM
to run theGetElementPtr
Loop Strength Reduction (LSR) pass after the first LICM pass. By default, the flag is set to false.Improved Conditional Coverage (CCOV) for the
RemoveLoopDependency
pass.Added the Inverse Multiply (INVM) and Reciprocal Square Root Multiply (RSQTM) intrinsics.
Added the
ScalarAliasBBSizeThreshold
flag to control the maximum size of Basic Block (BB) for which scalar to vector aliasing applies.Added the
CEncoder::CopyWithImplicitConversion
function to CEncoder to allow generating MOV instructions forsrc
anddst
combinations with different type sizes.Introduced an additional step in the compiler build process to split the BiF module into smaller sections, reducing the time required for materializing and inlining the module.
Improved parsing
igc_opts
to avoid a misleading error message.Enabled the
removeUnusedSLM
option.Extended the
cl_intel_subgroup_2d_block_io
extension to support the following built-ins:c intel_sub_group_2d_block_read_8b_8r16x4c
,intel_sub_group_2d_block_prefetch_8b_8r16x4c
,intel_sub_group_2d_block_read_transpose_32b_32r8x1c
.Implemented
AtomicPullSWWalkWrapperLoopImplementationPass
.Added a registry key to disable coalescing memory fences and extend control over the
SynchronizationObjectCoalescing
pass.Added support for
DIStringType
in DWARF.Enabled
GenISAIntrinsics
on LLVM16.Enabled denormal number support for systolic operations in the Vector Compiler (VC).
Introduced new GenISA intrinsic
WaveClusteredInterleave
that combines two wave reductions:WaveClustered
andWaveInterleave
.Introduced support for copy sign intrinsic in the Vector Compiler (VC).
Added fast exits in the pattern match.
Added TGM fence workaround for Xe2.
Enabled loads rescheduling in
CodeLoopSinking
.Enabled loop sinking of 2D block loads and shuffle patterns.
Added joint matrix support for the 32x32x16 combination for DG2.
Added early constant loading for Dot Product Accumulate Systolic (DPAS) in Vector Compiler (VC). DPAS operations do not support immediate operands, so the compiler should move the constants into registers.
Implemented an inverse square root built-in function.
Added a new version of the
GTSystemInfo
interface with a newSLMSizeInKb
field.Added a
global_barrier
implementation using atomic instructions.Added a GPUVA for
ubertilesmap
inRayDispatchGlobals
.Introduced Loop Cost Expression (LCE) support.
Introduced support for LIT opaque pointers.
Introduced a flag to disable dynamic RQ management.
Introduced
invm
andrsqtm
math functions to support double-precision inverse square root calculations.Enhanced the
cl_intel_subgroup_2d_block_io
extension by adding the following built-in functions:intel_sub_group_2d_block_read_8b_16r16x4c
,intel_sub_group_2d_block_read_8b_32r16x4c
, andintel_sub_group_2d_block_prefetch_8b_16r16x4c
.Introduced the
DisablePHIScalarization
option in theScalarizeFunction
pass to enable skipping of PHI node scalarization.Added an option to disable the
ScalarizeFunction
pass in the OpenCL pipeline.Improved
IGCVectorizer
by adding a new vectorization pattern with theFPTrunc
instruction and the ability to merge incompletely scalarized vector paths across multiple basic blocks.Implemented
SPV_INTEL_2d_block_io
.Added descriptive error messages for recoverable errors in the Vector Compiler (VC).
Extended the
cl_intel_subgroup_2d_block_io
extension with more variants by adding support for the following built-ins:c intel_sub_group_2d_block_read_8b_16r16x4c intel_sub_group_2d_block_read_8b_32r16x4c intel_sub_group_2d_block_prefetch_8b_16r16x4c
Improved Virtual Instrument Software Architecture (VISA) APIs to support general
nbarrier
.Added support for specifying the last caller-saved General Register File (GRF) using the VISA
-lastCallerSavedGRF
option.Optimized performance handling by introducing a third retry stage to drop to SIMD16/SIMD8 when PTSS is exhausted, improving resource efficiency.
Allowed building BiFs without the
OPAQUE_ARG
argument.Implemented debug information to support subroutines.
Optimized non-uniform indexed Resource Loops to improve performance.
Removed unnecessary TGM fences between stores in compute shader to improve performance.
Allowed using OpenCL Clang with an older Low-Level Virtual Machine (LLVM) version.
Enhanced Low-Level Virtual Machine (LLVM)
interim
mode by replacing the environment variable control with a newIGC_OPTION__LLVM_INTERIM
option for better integration within CMake files.Added the ability to set the
IGC_LLVM_INTERIM
mode, which allows creating theIGC_LLVM_TRUNK_REVISION
definition based on the-D
option orENV
variable.Introduced support for opaque pointers in newer Low-Level Virtual Machine (LLVM) versions.
Introduced kernel performance metrics.
Started emitting predefined runtime symbol in the ZEBinary’s symbol table.
Added support for bindless memory access.
Introduced an intrinsic for optimization fence in the Vector Compiler (VC).
Introduced the
DisablePHIScalarization
option that allows skipping PHI nodes scalarization in theScalarizeFunction
pass.Added the ability to disable the
ScalarizeFunction
pass in the OpenCL pipeline.Improved compilation time for
WIAnalysis
.Implemented a helper
SeparateSpillAndScratch
function to improve performance.Implemented bundle conflict reduction for two source instructions for OpenCL to improve performance.
Enabled default support for illegal integer types in
GetElementPtr
Loop Strength Reduction (LSR), improving handling of SCEV expressions with arbitrary integer widths.Added the capability to enable the use of automatic immediate offset for 2D block intrinsics.
Introduced a registry key to allow disabling emulation for floating-point 64-to-16-bit conversions.
Added BCR support for 5-source Dot Product Accumulate Systolic (DPAS). Previously, BCR only supported 3-source instructions.
Developed the
SPV_INTEL_subgroup_matrix_multiply_accumulate
SPIR-V extension to enable support for DPAS operations at a lower abstraction level compared to the joint matrix.Introduced support for the
SPV_INTEL_maximum_registers
SPIR-V extension that adds literal-based and ID-based execution modes for specifying the maximum number of registers for an entry point.Added support for a 3-channel image format to SYCL bindless images to enable access to the bindless texture hardware.
Modified the vectorizer to support vector emission of
fmul
instructions.Modified the pass threshold to optimize the i64 multiplication performance.
Improved vectorizer to support vector emission of
ftrunc
instructions.Enabled the
IndVarSimplification
pass to improve performance.Added more aggressive late rescheduling phase to the
CodeLoopSinking
pass and an option to disable the maximum sinking heuristic in the presence of 2D block reads.Improved the
InlineHelper
LLVM utility.Implemented the
MergeAllocas
pass and enabled allocation merging prior to the split asynchronous pass.Enabled the emission of vectorized floating-point addition (FADD) instructions, allowing the VISA emitter to process them efficiently.
Implemented nested 3D resource loop unrolling.
Enabled the debug feature for multiple components.
Improved capture by value for better performance, corrected some related VLoad API usage and improved checks for global volatile clobbering.
Improved GenXBaling and GenXRegionCollapsing by reusing existing utilities and improving internal logic for cleaner handling of vector stores and memory dependencies.
Added support for JointMatrix column-major accumulator loads and stores with transposed loads when possible.
Updated warning message for VLA detection to be clearer, with additional documentation on flag usage.
Added warning message for non-uniform
OpGroupBroadcast
to help identify spec-breaking kernels.Added block2d immediate offset emulation to support new functionality.
Updated Dot Product Accumulate Systolic (DPAS) latency for better instruction scheduling and Software Scoreboard tracking.
Added ARL and LNL functionality with support for related hardware platforms.
Disabled compaction for O0 compilation for debugging purposes and added a
DumpASMToConsole
key to enable dumping of assembly code.Added a new internal 2D block read built-in for
transpose u64 k4
.Enhanced error reporting in static asserts for implicit arguments structure to aid debugging.
Refined Intel Graphics Assembler (IGA) diagnostic messages for better clarity.
Created a new HW Debug pass to unify multiple diagnostic functions for easier debugging.
Optimized live range shortening using Live Elements analysis to prevent unnecessary extension of live ranges.
Added support for arithmetic operations for the last argument in
mad
instruction for broader use cases.Updated GED model for XE2 and IGA encoding model to support new hardware features.
Moved SLM (Shared Local Memory) functionality into a new pass for better memory handling, adding features like NullProtection and offset handling.
Included LSC prefetch and introduced a new API to access 2D block write from OpenCL-C.
Expanded 2D block read transform capability.
Integrated a tile fence into system routines.
Enabled a clone address arithmetic pass during retry and RPE cutoff.
Introduced support for bindless mode for CONST_BASE and GLOBAL_BASE implicit arguments.
Added an option to set the maximum General Register File (GRF) number that vISA can select
Introduced support for the
cl_intel_subgroup_2d_block_io
extension to enhance block read operations for efficient matrix multiplication algorithms. This extension adds block I/O operations with cache control, including reading multiple blocks without data rearrangement, transforming and transposing data during reads, prefetching blocks, and writing blocks without rearranging.Introduced support for the
cl_intel_subgroup_buffer_prefetch
extension to enable data prefetching from a buffer as a sub-group operation. This extension improves kernel performance by prefetching data into a fast cache, ensuring future reads occur from the cache instead of slower memory. The new block prefetch operations are supported in both the OpenCL C kernel programming language and the SPIR-V intermediate language.Introduced support for the
SPV_INTEL_cache_controls
extension forcl_intel_subgroups*/cl_intel_subgroup_buffer_prefetch
OpenCL extensions.Introduced support for
SPV_INTEL_cache_controls
for OpenCL prefetch.Introduced support for the
cl_intel_subgroup_matrix_multiply_accumulate_tf32
extension that extends the existingcl_intel_subgroup_matrix_multiply_accumulate
extension with support for TensorFloat(tfloat32) data type.Introduced support for bindless image for media block read/write.
Introduced support for joint matrix load B
row_major
bfloat16 16x64.Introduced support for API-specific values for
SIMD8/16/32_SpillThreshold
. Now, each API can have its own preferred setting forSIMD8/16/32_SpillThreshold
.Introduced support for the
CacheControlLoadINTEL
extension for OpenCL.Introduced support for more variants of 2d block reads.
Introduced support for
SPV_INTEL_subgroup_buffer_prefetch
.Added a new pass attempting to reduce number of times FP rounding mode is switched by moving and grouping together instructions using the same rounding mode.
Added the
jump-threading-through-two-bbs
option to control additional basic blocks construction when threading through two basic blocks.Added a few convenience functions for generating system values in
LVM3DBuilder
.Added the BMG support.
Added the missing sampler op conversions Level of Detail and Load (LOD & LD).
Added support for cache controls in the joint matrix.
Added check for dpas instructions in inline Virtual Instruction Set Architecture (VISA) for OpenCL.
Added the float type support for src2.
Added the scratch parameter to the
sub_group
sort built-ins.Added the Vector Compute (VC) support for the Battlemage platform.
Added the 1x64x32 joint matrix support.
Added a non-default cache control field to
addrSpace
to make the address spaces with same numeric value yet different caching policies distinguishable.Added
DEV_ID_56C2
.Added a new field to
RayDispatchGlobalData
.Added the
ConvertHandleTo*ImageINTEL
built-ins for the SYCL bindless image to support the SPIR-V extension’sSPV_INTEL_bindless_images
.Added a new simd flag to distinguish between explicit
force
and implicithinted force
.Added a field to
SBindlessProgram
.Added pseudo kill for address register spill fill code to fix the liveness issue.
Added debug SIP.
Added support for fp64 conversion emulation mode, where Intel® Graphics Compiler emulates only conversions.
Added Floating-Point Compare (fcmp) to supported instructions in the fp64 conversion emulation mode.
Added a warning message for FP64 emulation in cases of providing 2 options for FP64 conversion emulation.
Added additional flags for picking a kernel.
Increased the legal vector size to 64 to support the
2DBlockRead
calls.Introduced an initial support for
SPV_KHR_cooperative_matrix
to support for a new matrix format.Introduced an improvement to rollback
CodeLoopSinking
when register pressure increased.Improved the private memory Structure of Arrays (SOA) transpose by allowing array of struct to be transposed.
Intel® Graphics Driver Backports for Linux* OS (i915)
Introduced an experimental recovery mechanism for handling fatal GPU errors, designed to restore functionality without requiring a full system reboot or GPU reset. This helps reduce downtime and improves system reliability. This feature is disabled by default and can be enabled using the
enable_fatal_error_recovery
flag: a value of 1 routes fatal errors as Message Signaled Interrupts (MSI) and attempts a Secondary Bus Reset (SBR), while a value of 2 routes errors as MSI without attempting an SBR.Enabled the dynamic ICS via the opt-in KLV feature.
Updated the Graphics Micro Controller (GuC) to version 70.44.1.
Extended 2M
userptr
support to 1G.Enabled backport support for kernel version 6.13.
Added support for the
HBM_REPLACE
bit to signal High Bandwidth Memory (HBM) health status and its transition to theREPLACE
state. This enhancement enables the driver to detect the bit and prevent loading when the state changes toREPLACE
, while also reporting the issue and prompting HBM replacement.Enabled group busyness counters in a VF.
Supported dumping multiple engines for offline debugging.
Supported 4K pages in lmem swapper.
Enhanced HBM training failure reporting.
Added extra debug info for GuC CT errors.
Added PCI ID for new PVC vector-only SKU.
Added write barriers between flat-ppgtt init and usage.
Showed multiCCS status in sysrq-G.
Showed pagefault address in canonical format.
Added jiffies for missing age parameter.
Tuned active defrag and idle buddy allocation.
Supported marking VM_BIND vmas as read-only.
Updated DG2 HuC to version 7.10.14.
Added the survivability lite feature for firmware updates on Flex.
Enhanced the offline installer for RHEL by including pre-built kernel modules for the Intel i915 graphics driver. These modules simplify installation and management of the driver. For further details, see the documentation provided with the offline installer. To access this documentation, run the offline installer with the -s parameter.
Intel® Graphics Memory Management Library
Improved handling of coherent and compressible resources.
Added support for new Battlemage device IDs.
Introduced the
MOCS
variable for Xe2.Enabled
GO:L3
for OpenCL usages.Enabled
IsCpuCacheable
on Linux to improve performance.Enabled the
R10G10B10_XR_BIAS_A2_UNORM
format for display to support 10-bit color and HDR rendering with improved visual quality.Added the Media Video Processing (VP) performance tags that can help with optimization and debugging.
Intel® Graphics System Controller Firmware Update Library
Added timestamps to logs.
Implemented read firmware status register library API and read firmware status register in CLI.
Introduced a new error message to notify users about device iterator failures.
Enabled the logging of data traffic in the trace mode.
Intel® ME TEE Library
Added the
GetTRC
API.Added an
initialization
API that accepts all parameters.Added an option for setting a custom log callback.
Added a CMake preset for full debug builds.
Added a 32-bit release preset in CMake.
Added getters for
maxMsgLen
andprotocolVer
.
Intel® Media Driver for VAAPI
Added upstream Battlemage encoding support.
Added full support for the Lunar Lake platform in the upstream.
Introduced initial support for the Battlemage platform in the upstream.
Introduced upstream encoding support for Battlemage.
Added support for AV1 encoding with ARGB input.
Intel® Metrics Discovery Application Programming Interface
Added support for half-full Observability Architecture (OA) buffer interrupt in i915.
Added
CoreFrequencyMHz
details andMaxCount
global symbols for Xe2.Added support for
GpuCoreClocks
symbols in read equations.Introduced a global symbol that indicates GPU frequency override state.
Added return code handling for the following functions:
AddInformationSet
,SetSnapshotReportReadEquation
,SetSnapshotReportReadEquation
,SetOverflowFunction
,AddDefaultMetrics
,AddStartRegisterSet
,CreateMetricsFromPrototypes
, andRefreshConfigRegisters
Intel® oneAPI Level Zero
Added support for registering a
TeardownCallback
to notify clients upon release of Level Zero resources.Added support for sorting drivers based on provided devices.
Implemented basic leak checker in the validation layer.
Added
zeImageViewCreateExt
andzeMemFreeExt
support to the leak checker.Added API call logging to the validation layer.
Added the static Level Zero loader support.
Introduced support for 1.7 specification in the static loader.
Added event deadlock detection within the validation layer.
Started logging the full path of loaded libraries in traces for better debugging.
Added result passing to validation checkers at the epilogue stage.
Upgraded specification to version 1.12.15.
Intel® Video Processing Library
Introduced Intel® Video Processing Library API 2.15 support, including new property-based capability queries interface, extended decoder and encoder capabilities reporting, and definitions for VVC main 10 still picture profile and level 6.3.
Added the explicit
INSTALL_EXAMPLES
build option to control installation of example source code and content.Updated the default Ubuntu build to version 24.04.
Introduced support for Intel® VPL API 2.14, introducing new quality and speed settings for AI-powered video frame interpolation. This update also expands algorithm and mode selection options for AI-based super resolution and adds support for High Efficiency Video Coding (HEVC) level 8.5 decoding.
Improved compatibility with Python 3.12 development environments.
Intel® Video Processing Library GPU Runtime
Improved AV1 decoding performance when all decode frame surfaces are in use.
Enabled property-based capability queries.
Enabled full support for the Lunar Lake platform.
Added initial support for the Battlemage platform.
Introduced support for the Y210 format in media copy operations.
Added a check for AV1 decoding bitdepth changes when parsing SPS syntax to prevent decoding issues.
Aligned the default decode frame rate to 30fps.
Added support for
MFX_EXTBUFF_VIDEO_SIGNAL_INFO
in AV1 decoding to retrieve video signal information.Enabled dynamic decode frame rate by parsing frame rate data from the AV1 bitstream.
Improved reference frame patterns in pyramid cases.
Enabled block size selection for VP9 encoding segmentation.
Intel® Video Processing Library Tools
Introduced support for Intel® Media Transcode Accelerator.
Added new strings to the
vpl-inspect
tool to improve output readability.Added the
-props
option to thevpl-inspect
tool to support querying capabilities based on properties.Updated the default Ubuntu build to version 24.04.
Integrated screen content coding tools for AV1 into
sample_encode
.Added a new GTK renderer option to
sample_decode
andsample_multi_transcode
.Introduced a new
-fullscreen
option for GTK insample_decode
andsample_multi_transcode
. Users can now toggle full screen usingCtrl+f
and exit withEsc
.Enhanced support for Python 3.12 development environments.
Intel® XPU Manager and XPU System Management Interface
Added the GPU diagnostics with the copy engine for GPU memory, Peripheral Component Interconnect express (PCIe), and Xe link throughput.
Incorporated Manageability Engine Interface (MEI) error checking into the GPU diagnostics pre-check command.
Revised help information for the dump command.
Added the GPU performance data in the GPU diagnostics stress command.
Added the GPU temperature checking in GPU diagnostics.
Provided better GPU performance data when running GPU diagnostics on multiple GPUs.
Added the GPU firmware version checking in GPU diagnostics.
Added the Peripheral Component Interconnect express (PCIe) speed and width checking in GPU diagnostics.
Optimized the GPU diagnostics pre-check execution time.
Introduced support for the Flex AMC firmware update on the Lenovo SD530 V3 server.
Added the ability to display the date in the dump command when using the
--date
parameter.Added security consolidation.
Upgraded the vGPU parameters.
Improved GPU diagnostics when Single Root I/O Virtualization (SR-IOV) is enabled.
Improved the GPU diagnostics configuration file.
Added the ability to display the version of the
intel-i915-dkms
package.Improved GPU memory throughput reporting.
Improved Peripheral Component Interconnect Express (PCIe) downgrading checking.
2025-05-06
The 2350.150 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5
Ubuntu 22.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
Improvements
Intel® Graphics Driver Backports for Linux* OS (i915) and Intel GPU Firmware
Updated the Graphics Micro Controller (GuC) to version 70.44.1.
2025-04-02
The 2350.145 release supports the following operating systems:
Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5
Ubuntu 22.04
SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6
Features
General
Introduced support for RHEL 9.5.
Changes
General
Updated the signing key for KMD prebuilds to enhance security and ensure continued reliability. This key ensures that only trusted kernel-level software can run during the boot process. The new key, valid for one year, will be used to sign all new KMD module releases. If you use secure boot, you need to download and install a new Distinguished Encoding Rules (DER) certificate to maintain compatibility. If you do not use secure boot, no action is required.
Improvements
Intel® Graphics Driver Backports for Linux* OS (i915)
Updated the Graphics Micro Controller (GuC) to version 70.40.1.
Fixed an issue causing a memory error.
Introduced page fault handling improvements.
Implemented a workaround to address an encoder issue causing errors. The workaround adds support for the G8 power state in ATS-M to reduce idle power consumption.
Skipped the HuC microcontroller authentication register check and marked HuC as available if preloaded.
Intel® Graphics Compiler
Enabled optimizations for the
-cl-opt-disable
and-ze-opt-disable
API options.Fixed a crash issue that occurred during copy elimination.
Initialized address register to prevent unaligned cross-GRF access.
Replaced
%opt
withopt
in the vector compiler.
Intel GPU Firmware
Updated the Graphics Micro Controller (GuC) to version 70.41.0.
Known issues
When installing drivers from Intel repositories on LTS, the repositories may contain older package versions than those in public repositories. To ensure the correct versions are installed from Intel repositories, we recommended adding
priority=98
to the repository configuration. This helps effectively manage package version selection.For application workloads using HBM on SPR, ensure that the UMD policy chooses
I915_GEM_CREATE_MPOL_PREFERRED
instead ofi915_GEM_CREATE_MPOL_BIND
when calling ioctl. This can be achieved by replacing--membind=8
with--preferred=8
in thenumactl
command.