Release Notes: LTS

This document outlines changes introduced to the Intel® software for general-purpose GPU capabilities in LTS releases. As the software includes several different projects, the changes for each release are grouped by project.

To install packages for the preferred LTS release, refer to either the LTS 2523.x or LTS 2350.x installation guide for your distribution. For a list of packages published on repositories.intel.com/gpu for each release and operating system, see Provided Packages.

2025-09-05

Since the Intel® Data Center GPU Max Series and Intel® Data Center GPU Flex Series entered a stable and mature phase, this release introduces a new Long-Term Support (LTS) release stream, based on the final and fully developed 2523.12 rolling release. It is a new line including the latest features and enhancements, not a continuation of the previous LTS stream. The rolling and 2350-based LTS release streams have been discontinued, and ongoing maintenance and support will now be streamlined through this final, production-ready LTS 2523 stream. This release note lists features and breaking changes introduced since the previous LTS release, along with fixes introduced after the last rolling release.

The 2523.31 release supports the following operating systems:

  • Red Hat Enterprise Linux (RHEL): 8.10, 9.4, 9.6, and 10.0

  • Ubuntu 22.04 and 24.04

  • SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, 15 SP6, and 15 SP7

Improvements

Intel® Graphics Compiler for OpenCL™

  • Fixed an issue causing the RSBench application to fail.

  • Fixed an accuracy issue with aten::amax.

Breaking changes

General

Deprecated the Intel® Media SDK project, so the intel-mediasdk package is no longer included in this and all future LTS releases. For instructions on installing all required packages for this new LTS release, see the Installing Data Center GPU: LTS 2523.x Releases document. The previous LTS installation guide, now renamed to Installing Data Center GPU: LTS 2350.x Releases, does not apply to this release but you can still use it for earlier LTS releases.

Known Issue

Installation of IEFS version 12.1.0.0.149 may fail on SLES 15 SP6 systems running kernel 6.4.0-150600.23.47 due to a build error in the OFA kernel module. The error occurs because the rv module incorrectly uses the MODULE_IMPORT_NS(DMA_BUF) macro, resulting in a compilation failure.

Resolution: A patch is available to resolve this issue: View patch on GitHub

If you cannot apply the patch in your environment, use MOFED version 24.10-2.1.8.0 instead of 24.10-3.2.5.0-LTS. This MOFED version has been validated successfully with the previous IEFS version.

Features

General

  • Incorporated the latest security updates to address recent vulnerabilities, enhance protection, and ensure greater system reliability.

  • Introduced support for new operating systems: Red Hat Enterprise Linux 9.6 and 10.0, Ubuntu 24.04, and SUSE Linux Enterprise 15 SP7.

Intel CM Compiler

  • Introduced support for arbitrary SIMD in sampler intrinsics.

  • Added the cm_rsqrt implementation, which maps directly to the SPIR-V OpenCL RSqrt intrinsic.

  • Started supporting saturation in the rsqrt built-in.

  • Introduced support for full-width r0 access in the cm_get_r0 intrinsic.

  • Introduced a compiler option to enable cost analysis information.

  • Introduced the -vc-use-bindless-buffers and -vc-use-bindless-images options to enable bindless accesses. These options allow VC to generate bindless buffers and images, replacing stateful BTI-based ones.

  • Disabled global fence on platforms other than Intel Data Center GPU Max Series.

  • Added 8-bit floating point conversion intrinsics.

  • Added a helper function to retrieve the global thread ID along its dimension.

  • Added support for new Battlemage and Panther Lake devices.

  • Updated the CM specification including the LSC memory interface, cache controls, and CM macro requirements.

  • Added the stochastic rounding intrinsic declaration.

  • Included the main <cm/cm.h> header implicitly, enabling caching when compiling from the CM source.

  • Added intrinsics for 2D block load and store operations.

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

  • Added support for the BUFFER_SIZE explicit argument.

  • Added the Level Zero API for querying kernel argument data.

  • Added support for ZE_EXTERNAL_MEMORY_TYPE_FLAG_OPAQUE_FD to the Level Zero runtime.

  • Added a debug flag to force Graphics Memory Manager (GMM) system memory resource type.

  • Introduced initial support for creating media context.

  • Started skipping kernel internal allocations to command list residency.

  • Stopped initializing in-order TS nodes and introduced initial support for standalone CB events timestamps allocator.

  • Added support for HP copy engine context.

  • Allowed for dynamic counting of HP contexts in a context group.

  • Added support for the L3-cache reservation.

  • Allocated First In, First Out (FIFO) for the debugger.

  • Enhanced the TD Debug Control register to support per-context breakpoints and debug features, improving multi-context debugging and virtualization support.

  • Implemented new registers for the debugger and added support for V3 state save header.

  • Added a new SUPPORTED_DEVICES query option to OpenCL offline compiler to allow generating a YAML file containing information about supported devices.

  • Added DRM-specific memory banks computation logic.

  • Added a possibility of checking the WMTP status of a device using the engine capability flag.

  • Modified kernel residency to enable kernel mutation capabilities, allowing for mutability of the kernel residency list. This enhancement provides fine control over added or replaced allocations, whether as kernel arguments or internal kernel allocations.

  • Added the statelessBuiltinsEnabled boolean to the command list.

  • Enabled Blitter Command Streamer (BCS) for transfers when Compute Command Streamer (CCS) is busy. Additionally, started using staging buffer as a pool for multiple smaller transfers, limited barrier usage in non-USM copies to single transfers, and enabled staging buffer copies.

  • Added support for the cl_khr_extended_bit_ops extension.

  • Implemented the zeCommandListImmediateAppendCommandListsExp API to launch the existing command list into an immediate command list.

  • Added memory tracking for system and per-device allocations, enabling enhanced memory pooling heuristics.

  • Implemented the product helper specializations for querying device support for 2D block load/store operations.

  • Introduced external required scratch space and a kernel command view flag at the command list level. Additionally, added a flag to block implicit scaling commands during dispatch, along with an option to enable compute walker command view.

  • Added regular and HP contexts to groups without a dedicated HP engine.

  • Added context group support for root device engines.

  • Added support for data cache limits in peak power.

  • Introduced SW First In, First Out (FIFO) implementation.

  • Added support for state save area header v4.

  • Modified the setErrorDescription API.

  • Started using the iotclhelper to get the number of media engines.

  • Upgraded the Core API to version 1.9, enabling the immediate execution of existing command lists by converting them into immediate command lists.

  • Added support for querying the number of L3 cache banks to the preliminary version of the Xe architecture. This allows software to retrieve information about how many banks of L3 cache are available on the GPU.

  • Enabled bindless mode that allows shaders to directly access resources like textures, buffers, and other memory objects without the need for explicit binding to specific slots in the traditional way. This provides more flexibility and efficiency, particularly for workloads with large numbers of resources.

  • Started using the global bindless allocator in the in Xe² HPG architecture to manage the assignment of global pointers to resources in the bindless mode.

  • Added an API that retrieves kernel binaries from kernel.

  • Updated kernel residency management to save the position of the kernel internal container when allocation can change.

  • Added an input/output control helper function to get fence address and set external context.

  • Added implementation of standalone CB events without pool allocation supporting all features for regular events.

  • Added support for running SIMD16 operations on Execution Units (EUs) contained within each Dual-SubSlice (DSS) on the preliminary version of the Xe architecture.

  • Implemented the kernel trace functionality to support the metric group type.

  • Added additional parameters to the Graphics Microcontroller (GuC) to support the region allocation logic.

  • Implemented setErrorDescription in os_interface/linux for drm_buffer_object, drm_memory_manager, and ioctl_helper_prelim.

  • Added a debug flag to disable walker splitting for copy operations.

  • Updated the Level Zero Core version to 1.6.

  • Added a debug key for setting MaxSubSlicesSupported.

  • Implemented a debug flag to manage the direct submission semaphore mode.

  • Implemented a debug flag that allows changing the ULLS Blitter Command Streamer (BCS) timeout.

  • Added new functions to the dispatch table to support mutation of kernel Instruction Set Architecture (ISA).

  • Added a class for multiple device metric to calculate the report format.

  • Added a EnableCompatibilityMode flag to support binary compatibility across multiple hardware targets.

  • Added support for 2D block load and store extension queries.

  • Introduced a new forward-compatibility model for zeinfo to emit an error whenever an unknown attribute is encountered.

  • Added a getter method for accessing device node information from the Direct Rendering Manager (DRM).

  • Added the missing event scope flags for Command Buffer (CB) handling.

  • Added support for system memory in virtual memory functions.

  • Added new parameters to the GuC System Information (SysInfo) Blob for enhanced functionality.

  • Extended zeDeviceGetProperties with additional device properties, such as module_id and server_type tokens.

  • Enhanced log messages to support setting message severity using environment variables.

  • Added heapless built-ins with images compilation.

  • Started using heapless built-ins for images.

  • Added support for custom compiler backends to support loading different versions of the backend compiler based on underlying device.

  • Started using the sysInfo helper for detecting the memory type.

  • Started checking the peak power support using the escape call.

  • Added support for 3-channel configuration in the image format descriptor.

  • Updated the General Register File (GRF) register implementation.

  • Implemented error handling to trigger when the OA buffer overflows.

  • Enhanced kernel parameter configuration by adding support for passing additional enqueue and zebin parameters. This enables features such as quantum dispatch and quantum size specification directly within zebin for better encapsulation and OpenCL compatibility.

  • Added support for the custom allocator in work partition allocation.

  • Added idle Control and Status Register (CSR) detection and improved timeout handling in the ULLS controller.

  • Implemented VF engine utilization API.

  • Added the input and output control helper functions to mmap and unmap operations, acquire and release the GPU range, allocate user pointer, and synchronize the userptr allocation.

  • Exposed new counter-based events and added the default mode for zexCounterBasedEventCreate2.

  • Introduced support for physical host memory.

  • Updated Level Zero metrics to align with v1.11 headers.

  • Started specifying the cache level when reserving a region.

  • Added GPU and memory power domain support for getEnergyCounter.

  • Added support for three channels in Level Zero.

  • Introduced support for zeInitDrivers that combines driver initialization and retrieval functionality. Updated the GTPIN initialization logic to execute only when pCount is greater than 0 and the driver handle is non-null. Additionally, removed the unused ze_init_flags_t flag from all driverInit functions.

  • Enabled counter-based allocation peer sharing to support scenarios involving in-order command lists with multi-GPU event scenarios.

  • Added support for two Xe-eudebug interfaces within a single binary. The new EuDebugInterface class encapsulates eudebug functionality, with CMake flags to control Xe-eudebug and prelim uAPI support.

  • Added a root device flag check for multi-device scenarios, so that APIs using root device handles can now validate this flag and handle failures gracefully.

  • Added a new uAPI macro in the engine module to fetch the configuration of the total ticks.

  • Added Process Maturity Table (PMT) counter offset values for Battlemage.

  • Started handling page fault events in the Xe debugger.

  • Added support for the cl_khr_expect_assume OpenCL extension that introduces mechanisms to supply the compiler with information that can enhance the performance of certain kernels.

  • Implemented the Level Zero zeKernelGetBinaryExp API that allows retrieving kernel binary program data.

  • Added support for shared system Unified Shared Memory (USM) allocation in appendLaunchKernel.

  • Implemented enhancements to the Unified Shared Memory (USM) reuse mechanism, including the introduction of a USM reuse cleaner that efficiently manages system and local memory across different reuse strategies, as well as an extension of the USM reuse limit infrastructure.

  • Improved cache management by supporting whitelisted includes.

  • Added support for handling new Reliability, Availability, and Serviceability (RAS) errors in Sysman.

  • Implemented alignment of host Unified Shared Memory (USM) to 2MB on discrete devices when the allocated size exceeds 2MB.

  • Improved command stream performance on Intel Graphics GPU by preallocating internal heap and multiple command buffers, and optimizing GMM for cacheable command buffers.

  • Added a debug key to apply high power throttling hints on WDDM initialization.

  • Enabled a new dispatch monitor fence policy in direct submissions and removed redundant TLB flushes.

  • Added support for OpenCL C support queries to the OpenCL Offline Compiler, enabling automated compilation workflows.

  • Updated the Digital Display Interface (DDI) to include new introspection APIs for events, command lists, and command queues, with version checks based on Level Zero 1.9.

  • Introduced heapless mode programming support across Level Zero, SBA, OpenCL, and related tooling.

  • Introduced support for using a high-priority engine.

  • Added initial support for patching region parameters and updated related zeinfo arguments.

  • Enabled implicit conversion to counter-based events and introduced related experimental APIs and features for enhanced event handling.

  • Added functions to get and set virtual address space handles, returning an error if unsupported.

  • Added support for memory policy configuration for GEM_CREATE using a new ioctl extension and debug variables.

  • Added support for the ForceBcsEngineIndex flag.

  • Improved metric notifications by unifying the notification structure.

  • Added a NEEDS_RESET property to indicate when event reset is required.

  • Ported the Group Engine Interface to the Xe architecture.

  • Added support for RAS clear state experimental functionality.

  • Introduced a basic structure for tracer support.

  • Enabled device allocation chunking by default for multi-tile configurations with implicit scaling.

  • Added support for the activateMetrics feature.

  • Introduced a caller ID feature in aubstream to help distinguish capture environments.

  • Created boilerplate support for querying current voltage.

  • Added a debug variable to configure BO chunking size.

  • Added Product Helper support to the Performance Module.

  • Implemented a query for kernel maximum group size using ze_kernel_max_group_size_ext_properties_t.

  • Added debug flags and instrumentation for waitpkg calls, including wrappers and CPU feature detection.

  • Implemented an initial query to retrieve kernel register count per device and kernel, as a precursor to formal support.

  • Added an interface to bind resources as read-only.

  • Added a debugger functionality to the global stateless feature.

  • Introduced support for synchronization dispatch token allocation.

  • Implemented zeCommandListAppendImageCopyToMemoryExt/FromMemoryExt.

  • Registered a critical metadata section for the Xe debugger.

  • Introduced a mechanism that assigns a synchronization queue ID in the ImplicitScaling mode.

  • Introduced an initial support for synchronized dispatch.

  • Implemented a VM_BIND EU debug event for Xe.

  • Implemented Sideband Access (SBA) and module debug access for the Xe debugger.

  • Added the override key to change the command list update capability.

  • Added a debug flag for metrics logs.

  • Enabled the detection of Xe Direct Rendering Manager (DRM) by default.

  • Enabled finding the CPU base address from all command buffers in the container.

  • Added support for the CCS mode configuration via SysFs.

  • Added logic to iterate for all contexts to check the GPU page fault.

  • Enabled the per-IP euStall functionality.

  • Introduced a functionality that aborts when an unexpected GPU page fault is detected.

  • Introduced in-order host counter allocation pooling.

  • Added a flag to allow noop space for command buffer (CB) events from the same in-order pool.

  • Added a new functionality to patch helpers.

  • Added output list of commands for counter-based wait events.

  • Added a debug key to generate the SIP header file.

  • Introduced state programming during driver initialization for heapless OpenCL (OCL).

  • Introduced storing the timestamp command buffer (CB) event clear and sync commands.

  • Implemented metadata create event handling in Xe.

  • Added a command pointer to store data for the immediate encoder.

  • Added a feature that returns an error when file handles are exhausted in the Sysman engine.

  • Implemented a signal for OpenGL (OGL) to indicate the creation and destruction of shared buffers.

  • Implemented reporting for multi-hop fabric connections.

  • Added a wait command list argument.

  • Added a timestamp to the postsync command in the list argument.

  • Enabled a workaround for dummy blit operations on Ponte Vecchio.

  • Introduced support for pooling in-order counter allocations.

  • Implemented metadata attaching for vm_bind in Xe.

  • Introduced bindless image extension in ImageView.

  • Implemented thread control and the att event handling for Xe.

  • Introduced the bindless images extension and supported pitched pointer (ptr) functionality.

  • Implemented zeCommandListImmediateAppendCommandListsExp.

  • Introduced a query to get the kernel module register sizes.

  • Added support for legacy acronyms in ocloc’s fatbinary.

  • Introduced a bindless image extension and image properties.

  • Added a debug flag to flush the Translation Lookaside Buffer (TLB) before copying.

  • Implemented the use of heapless built-in functions in Level Zero when supported.

  • Added the ZE_experimental_bindless_image extension.

  • Implemented a thread control for the debugger in Xe.

  • Implemented a method to update Level Zero helper command lists.

  • Implemented read/writeGpuMemory for the Xe debugger.

  • Implemented the utilization of heapless built-in functions in OpenCL (OCL) when supported.

  • Added keys to override the sync mode for the immediate command list.

  • Introduced an initial support for the local dispatch size query.

  • Added introspection APIs for events and command list and command queue.

  • Implemented the process entry/exit event with Xe.

  • Added debug flags to force pat index.

  • Added support for DRM_XE_EUDEBUG_EVENT_VM event handling.

  • Added print of the CPU flags and the address size upon detection.

  • Implemented the Debugger Open Input/Output Control (IOCTL).

  • Enabled Debug Attach capability for Xe.

  • Implemented debug information logging for PAT indexes.

  • Implemented the Xe Execution Unit (EU) Debug Open and Execute Queue events.

  • Enhanced reporting of the maximum cooperative group count.

  • Added umonitor and umwait synchronization function.

  • Allowed waiting for the standalone command buffer (CB) event.

  • Added stateless heapless built-ins.

  • Implemented a debug key to toggle a bit in the 57-bit GPU Virtual Address for specific allocations.

  • Introduced initial support for zexCounterBasedEventCreate.

  • Introduced support for zexEventGetDeviceAddress.

  • Implemented patching command buffer (CB) events on non-inOrder regular command lists.

  • Introduced a new API to create and export counter-based events.

  • Enabled the HP flag when creating HardwareContextController.

  • Enhanced post-sync system memory fence programming.

  • Implemented high-priority Command Streamer (CSR) from secondary contexts.

  • Introduced support for memory policy in GEM_CREATE.

  • Introduced initial support and implemented error handling for counter-based event flags.

  • Enabled support for secondary contexts in group.

  • Enhanced Linux OpenCL/OpenCL (CL)/OpenGL (GL) sharing support.

  • Added L3 Fabric Error monitoring in Sysman Directory.

  • Added support for firmware flash progress API.

  • Added monitoring for L3 fabric errors.

  • Enabled support for spill/private size in execution environment.

  • Implemented reading of the indirect detection version.

  • Added Sysman product helper in the preliminary and non-preliminary file for memory.

  • Included L3 Fabric Error monitoring in Sysman Directory.

  • Implemented support for firmware flash progress API.

  • Added monitoring for L3 fabric errors.

  • Enabled support for spill/private size in execution environment.

  • Implemented reading of the indirect detection version.

  • Enabled support for cl_intel_subgroup_2d_block_io to facilitate PyTorch upstreaming.

  • Enabled support for cl_intel_subgroup_matrix_multiply_accumulate_tf32 and cl_intel_subgroup_buffer_prefetch.

  • Introduced backward compatibility for the Intel’s 12th generation of the low power integrated graphics architecture (Gen12LP).

  • Added new definitions to support Ahead-Of-Time (AOT) configurations.

  • Introduced a mechanism for detecting new Virtual Memory (VM) bind flags support in Xe path.

  • Introduced support for the Blitter Command Stream (BCS) fence programming in the copy offload path.

  • Introduced profiling support in the copy offload path.

  • Introduced handling in-order counters in the copy offload path.

  • Introduced 2-tile device memory chunking independent of the KMD migration.

  • Added Display Data Interface (DDI) table entry for the GetPitch function.

  • Improved the elf rewriter to preserve strings.

  • Introduced support for git SHA logging to a log file.

  • Introduced using copy commands for offload operations.

  • Added the Executable and Linkable Format (ELF) rewriter utility.

  • Introduced an API definition that exposes media capabilities.

  • Created a copy offload queue under a debug flag.

  • Added support for 3-channel formats with 8, 16, and 32-bit depths.

  • Introduced StorageInfo for the Drm-specific customization.

  • Enabled support for cl_intel_subgroup_2d_block_io.

  • Introduced API stubs for media communication.

  • Introduced bindless sampled image support.

  • Introduced passing GraphicsAllocation to fence wait.

  • Added the Memory Copy/Control Logic (MCL) functions to a dispatch table.

  • Added update capability flags.

  • Introduced support for creating multiple metric groups from metrics.

  • Added tile-to-lmem-region map to MemoryInfo.

  • Added a registry key to control the AUB/TBX writable for buffer host memory.

  • Introduced global bindless sampler offsets.

  • Added a field for reserving extra payload heap space.

  • Added setupIpVersion for Xe.

  • Enabled a basic framework for super fast logging.

  • Added the TBX support for unified shared memory (USM) with USM host pointer.

  • Introduced implicit synchronization mode dispatch for cooperative kernels.

  • Introduced an experimental API for suggesting Message Signaled Interrupts for PCI Express (MSIX) allocation.

  • Implemented zeCommandListImmediateAppendCommandListsExp.

  • Added a wrapper for accessing the template method and getting the local ID generation.

  • Introduced support for concurrent groups.

  • Introduced assigning the external interrupt ID to the event.

  • Introduced storing the walker command in the CPU memory for appending launch kernel.

  • Introduced returning experimental synchronization queue extension.

  • Added mask of tiles to each memory region in Xe.

  • Adjusted Page Attribute Tables (PAT) for data caches (DC) flush mitigation.

  • Introduced an experimental API responsible for synchronizing dispatch.

  • Added a method that toggles whether mid thread preemption is allowed or not.

  • Added a number of l3 banks to TopologyData.

  • Added a new API stubs related to Message Signaled Interrupts for PCI Express (MSIX).

  • Introduced marking selected resources as uncacheable memory (UC) when mitigating data cache flush.

  • Updated program fence during Blitter Command Stream (BCS) tag update.

  • Added support for a null Address Translation Hub (AUB) mode.

  • Added a memory copy device to the host fence for host visible events in the Level Zero path.

  • Added a program device to the host fence in the OpenCL path.

  • Added a helper to check whether a device for host copy fence is required.

  • Enabled scratch address patching on regular command lists.

  • Improved patching of the scratch inline pointer.

  • Added heapless and global stateless scratch address patching.

  • Introduced a heapless state initialization in Level Zero.

  • Introduced a full synchronization dispatch mode initialization path.

  • Introduced RAS for the netlink interface.

  • Introduced support of bindless samplers in Level Zero.

  • Introduced support for explicit memory locking.

  • Implemented the getElfSize and getElfData methods for debugging and troubleshooting execution units in the Xe architecture.

  • Added a getter for the walker inline data offset.

  • Introduced support for boilerplate for the spec 1.9 features.

  • Added support for __INTEL_PER_THREAD_OFF.

  • Introduced opencl support for bindless kernels.

  • Added debug messages about the zero engine size.

  • Added an option to read scratch page options during initialization.

  • Added the Level Zero console logging for kernel buffer arguments.

  • Added functions for storing extra engines.

  • Added defaultThreadArbitrationPolicy to the command list.

  • Started using legacy versions of injectMMIOList and setTbxServerIp.

  • Added the heaplessStateInitialized flag.

  • Created helper for maxPtssIndex.

  • Improved the createModuleFromFile function.

  • Improved cache host resources when mitigating data cache flush.

  • Enabled device Unified Shared Memory (USM) allocation reuse.

  • Added Unified Shared Memory (USM) host memory pooling.

  • Enabled indirect allocations as packing in OpenCL.

  • Started sharing inter-module Instruction Set Architecture (ISA) allocations.

  • Started iterating over indirect allocations.

  • Started reusing GPU timestamps instead of the KMD escape.

  • Started using a stack vector to create packet initialization data.

  • Optimized counter-based waiting schemes.

Intel® Graphics Compiler

  • Implemented GenISA predicated load/store intrinsics with promotion pass.

  • Added the ActiveThreadsOnlyBarrier option for OpenCL shaders.

  • Improved call site inlining heuristic.

  • Added a call merger pass that merges mutually exclusive function calls when they are too large to inline.

  • Added the __builtin_IB_disable_ieee_exception_trap and GenISA_disable_ieee_exception_trap intrinsic.

  • Introduced additional transpose block 2D SPIR-V APIs.

  • Added a flag to disable merging allocas of different types, providing better control over the merge alloca pass and disabling aggressive merging by default.

  • Added 32-bit ELF type support to ZeBin for x86 use cases.

  • Enhanced MergeAllocas performance by replacing all allocas, generating casts at the point of use, handling select instructions in liveness analysis, avoiding merging allocas across ContinuationHL calls in raytracing, and disabling allocas merging for raytracing.

  • Added support for recognizing OpenCL/SPIR-V built-ins represented as TargetExtTy to the ProcessFuncAttributes pass.

  • Enabled SIMD16 drop for Xe3 to minimize register spills.

  • Added the hasLscStoresWithNonDefaultL1CacheControls flag to zeinfo, enabling 3D clients to detect Load Store Cache (LSC) stores with non-default L1 cache policies for proper UAV coherency flushing.

  • Started using MergeAllocas for private memory merging, allowing reuse of non-overlapping private memory allocations to reduce overall memory usage.

  • Added support for SPIR-V MulExtended instructions to the Vector Compiler (VC).

  • Set the default General Register File (GRF) size to 128.

  • Added VISA support for HF8 conversion instruction and Panther Lake devices.

  • Enabled SetHasSample for the gather4* instructions.

  • Added support for stochastic round bf8 intrinsic in the Vector Compiler (VC).

  • Added Panther Lake support.

  • Added support for new Battlemage device IDs.

  • Introduced support for Floating-Point DIVide (FDIV) instructions inside IGCVectorizer.

  • Added optional Intel® Graphics Compiler flags to enable multiplication expressions and non-constant integer steps. EnableGEPLSRUnknownConstantStep allows for expressions with steps unknown at the compilation time. EnableGEPLSRMulExpr allows for expressions with multiplication. If enabled, it also enables EnableGEPLSRUnknownConstantStep. Both flags are disabled by default.

  • Introduced a new GenISA intrinsic WaveInterleave that does subgroup reduction on each n-th work item.

  • Introduced built-ins support for min and max operations in CMCL.

  • Added the LoopAllocaUpperbound pass pattern to identify single-basic-block (1-BB) loops with a non-constant upper bound and memory accesses to a constant-sized array.

  • Enhanced Barrier Control Flow optimization by implementing synchronization and flushing directly on the running thread.

  • Enabled group sort built-ins.

  • Added support for WaveClustered to the SubGroupReductionPattern pass.

  • Improved the trimming algorithm.

  • Implemented the PromoteBools vector type promotion support.

  • Added pre-header to the main subroutine’s FuncInfo instance.

  • Introduced support for kernel cost model.

  • Added an Instcombining pass to fix multi-cast instructions.

  • Added Vector Compiler (VC) internal intrinsics for block 2D load and enabled storing and prefetching operations with the matrix payload passed as an address operand.

  • Added an option to always force the Load Store Cache (LSC) immediate offset pattern match.

  • Added additional step during build of compiler which splits the whole BiF module into smaller sections. In this way materializing and inlining will require less time.

  • Enabled more patterns for simdShuffleXor.

  • Improved logic for deciding between block and scatter spill.

  • Added a convergent attribute to block read and write intrincsics.

  • Introduced support for atomic scalar operations Vector Compiler (VC) that handles atomic Load Store Cache (LSC) intrinsics with scalar operands.

  • Introduced support for the SPV_EXT_shader_atomic_float_min_max extension.

  • Introduced a new multiple accuracy IMF math functions and updates in existing implementations.

  • Added platforms to avoid dst/src overlap and dead lock due to the SBID dependence.

  • Added a warning message to notify that the requested sub_group_size=32 cannot be compiled due to CallWA being enabled.

  • Introduced simplification of PHI instruction in ScalarEvolution to extend supported patterns for creating SCEVs iteratively in ScalarEvolution.

  • Enabled Optimization Remark Emitter to emit remark for passes.

  • Added a mechanism to pass Virtual Instrument Software Architecture (VISA) finalizer option via API.

  • Enabled optimization for non-specific API options: -cl-opt-disable and -ze-opt-disable.

  • Added localId and globalId to the BufferBoundsChecking assert message.

  • Enabled the D64 Load Block message for the cl_intel_subgroups_long extension.

  • Improved the insertion logic for proceed-based dynamic ray management.

  • Introduced a new GenISA intrinsic WaveClusteredInterleave that combines two wave reductions: WaveClustered and WaveInterleave. The subgroup is split into clusters and then each cluster does interleaved reduction. The change includes a pattern match for reduction implemented with subgroup shuffles.

  • Enabled loads rescheduling in CodeLoopSinking.

  • Enabled loop sinking of 2D block loads and shuffle patterns.

  • Added a SLMSizeInKb field to the GTSystemInfo with a new version of GTSystemInfo interface.

  • Enabled SIMD16/32 for a built-in with 32-bit support.

  • Added an additional type in getDefaultAccessType.

  • Added CMake configuration required to build on a riscv64 host.

  • Added a helper for getting a memory surface operand in the Vector Compiler (VC).

  • Implemented a single precision fdiv.

  • Added an OpenCL option to enable profile-guided trimming.

  • Enabled internal Load Store Cache (LSC) cmask typed intrinsics.

  • Enabled 2D block intrinsics conversion pass when LICM is not enabled.

  • Started supporting bindful reads in ConstantCoalescing to support new patterns in ConstantCoalescing: indirect read accesses from bindful resources.

  • Added genx invert-intrinsics in bif.

  • Enabled internal Load Store Cache (LSC)-typed 2D intrinsics.

  • Extended pattern matching for inverse square root in Vector Compiler (VC) to improve performance and accuracy.

  • Added pass checking if the address used in the instruction is greater of equal than the minimum valid address defined by a user.

  • Added a registry key and a compilation option to disable the ShrinkArrayAlloca pass.

  • Added a function argument resolution support for JointMatrixFuncsResolutionPass.

  • Added a new WaveInterleave GenISA intrinsic that does subgroup reduction on each n-th work item.

  • Added optional Intel® Graphics Compiler flags to enable multiplication expressions and non-constant integer steps. The EnableGEPLSRUnknownConstantStep flag allows for expressions with steps unknown at compilation time and is disabled by default. The EnableGEPLSRMulExpr flag allows for expressions with multiplication and is disabled by default. When EnableGEPLSRMulExpr is enabled, it also enables EnableGEPLSRUnknownConstantStep.

  • Improved memory optimization by running an early the memcpy optimization pass and ensuring correct destination alignment for memcpy operations from global constants, aligning results closer to the legacy behavior for better performance and optimization.

  • Improved the CodeLoopSinking pass to support multi-instruction candidates, vector shuffle pattern, and 2D block loads sinking.

  • Added the ability to set SWSB_ENCODE_MODE via a command line option in the Intel Graphics Assembler (IGA) standalone tool.

  • Added support for i64 shuffle in the SubGroupReductionPattern pass by implementing it as a shuffle of <2 x i32>.

  • Added a new flag RunGEPLSRAfterLICM to run the GetElementPtr Loop Strength Reduction (LSR) pass after the first LICM pass. By default, the flag is set to false.

  • Improved Conditional Coverage (CCOV) for the RemoveLoopDependency pass.

  • Added the Inverse Multiply (INVM) and Reciprocal Square Root Multiply (RSQTM) intrinsics.

  • Added the ScalarAliasBBSizeThreshold flag to control the maximum size of Basic Block (BB) for which scalar to vector aliasing applies.

  • Added the CEncoder::CopyWithImplicitConversion function to CEncoder to allow generating MOV instructions for src and dst combinations with different type sizes.

  • Introduced an additional step in the compiler build process to split the BiF module into smaller sections, reducing the time required for materializing and inlining the module.

  • Improved parsing igc_opts to avoid a misleading error message.

  • Enabled the removeUnusedSLM option.

  • Extended the cl_intel_subgroup_2d_block_io extension to support the following built-ins: c intel_sub_group_2d_block_read_8b_8r16x4c, intel_sub_group_2d_block_prefetch_8b_8r16x4c, intel_sub_group_2d_block_read_transpose_32b_32r8x1c.

  • Implemented AtomicPullSWWalkWrapperLoopImplementationPass.

  • Added a registry key to disable coalescing memory fences and extend control over the SynchronizationObjectCoalescing pass.

  • Added support for DIStringType in DWARF.

  • Enabled GenISAIntrinsics on LLVM16.

  • Enabled denormal number support for systolic operations in the Vector Compiler (VC).

  • Introduced new GenISA intrinsic WaveClusteredInterleave that combines two wave reductions: WaveClustered and WaveInterleave.

  • Introduced support for copy sign intrinsic in the Vector Compiler (VC).

  • Added fast exits in the pattern match.

  • Added TGM fence workaround for Xe2.

  • Enabled loads rescheduling in CodeLoopSinking.

  • Enabled loop sinking of 2D block loads and shuffle patterns.

  • Added joint matrix support for the 32x32x16 combination for DG2.

  • Added early constant loading for Dot Product Accumulate Systolic (DPAS) in Vector Compiler (VC). DPAS operations do not support immediate operands, so the compiler should move the constants into registers.

  • Implemented an inverse square root built-in function.

  • Added a new version of the GTSystemInfo interface with a new SLMSizeInKb field.

  • Added a global_barrier implementation using atomic instructions.

  • Added a GPUVA for ubertilesmap in RayDispatchGlobals.

  • Introduced Loop Cost Expression (LCE) support.

  • Introduced support for LIT opaque pointers.

  • Introduced a flag to disable dynamic RQ management.

  • Introduced invm and rsqtm math functions to support double-precision inverse square root calculations.

  • Enhanced the cl_intel_subgroup_2d_block_io extension by adding the following built-in functions: intel_sub_group_2d_block_read_8b_16r16x4c, intel_sub_group_2d_block_read_8b_32r16x4c, and intel_sub_group_2d_block_prefetch_8b_16r16x4c.

  • Introduced the DisablePHIScalarization option in the ScalarizeFunction pass to enable skipping of PHI node scalarization.

  • Added an option to disable the ScalarizeFunction pass in the OpenCL pipeline.

  • Improved IGCVectorizer by adding a new vectorization pattern with the FPTrunc instruction and the ability to merge incompletely scalarized vector paths across multiple basic blocks.

  • Implemented SPV_INTEL_2d_block_io.

  • Added descriptive error messages for recoverable errors in the Vector Compiler (VC).

  • Extended the cl_intel_subgroup_2d_block_io extension with more variants by adding support for the following built-ins: c intel_sub_group_2d_block_read_8b_16r16x4c intel_sub_group_2d_block_read_8b_32r16x4c  intel_sub_group_2d_block_prefetch_8b_16r16x4c

  • Improved Virtual Instrument Software Architecture (VISA) APIs to support general nbarrier.

  • Added support for specifying the last caller-saved General Register File (GRF) using the VISA -lastCallerSavedGRF option.

  • Optimized performance handling by introducing a third retry stage to drop to SIMD16/SIMD8 when PTSS is exhausted, improving resource efficiency.

  • Allowed building BiFs without the OPAQUE_ARG argument.

  • Implemented debug information to support subroutines.

  • Optimized non-uniform indexed Resource Loops to improve performance.

  • Removed unnecessary TGM fences between stores in compute shader to improve performance.

  • Allowed using OpenCL Clang with an older Low-Level Virtual Machine (LLVM) version.

  • Enhanced Low-Level Virtual Machine (LLVM) interim mode by replacing the environment variable control with a new IGC_OPTION__LLVM_INTERIM option for better integration within CMake files.

  • Added the ability to set the IGC_LLVM_INTERIM mode, which allows creating the IGC_LLVM_TRUNK_REVISION definition based on the -D option or ENV variable.

  • Introduced support for opaque pointers in newer Low-Level Virtual Machine (LLVM) versions.

  • Introduced kernel performance metrics.

  • Started emitting predefined runtime symbol in the ZEBinary’s symbol table.

  • Added support for bindless memory access.

  • Introduced an intrinsic for optimization fence in the Vector Compiler (VC).

  • Introduced the DisablePHIScalarization option that allows skipping PHI nodes scalarization in the ScalarizeFunction pass.

  • Added the ability to disable the ScalarizeFunction pass in the OpenCL pipeline.

  • Improved compilation time for WIAnalysis.

  • Implemented a helper SeparateSpillAndScratch function to improve performance.

  • Implemented bundle conflict reduction for two source instructions for OpenCL to improve performance.

  • Enabled default support for illegal integer types in GetElementPtr Loop Strength Reduction (LSR), improving handling of SCEV expressions with arbitrary integer widths.

  • Added the capability to enable the use of automatic immediate offset for 2D block intrinsics.

  • Introduced a registry key to allow disabling emulation for floating-point 64-to-16-bit conversions.

  • Added BCR support for 5-source Dot Product Accumulate Systolic (DPAS). Previously, BCR only supported 3-source instructions.

  • Developed the SPV_INTEL_subgroup_matrix_multiply_accumulate SPIR-V extension to enable support for DPAS operations at a lower abstraction level compared to the joint matrix.

  • Introduced support for the SPV_INTEL_maximum_registers SPIR-V extension that adds literal-based and ID-based execution modes for specifying the maximum number of registers for an entry point.

  • Added support for a 3-channel image format to SYCL bindless images to enable access to the bindless texture hardware.

  • Modified the vectorizer to support vector emission of fmul instructions.

  • Modified the pass threshold to optimize the i64 multiplication performance.

  • Improved vectorizer to support vector emission of ftrunc instructions.

  • Enabled the IndVarSimplification pass to improve performance.

  • Added more aggressive late rescheduling phase to the CodeLoopSinking pass and an option to disable the maximum sinking heuristic in the presence of 2D block reads.

  • Improved the InlineHelper LLVM utility.

  • Implemented the MergeAllocas pass and enabled allocation merging prior to the split asynchronous pass.

  • Enabled the emission of vectorized floating-point addition (FADD) instructions, allowing the VISA emitter to process them efficiently.

  • Implemented nested 3D resource loop unrolling.

  • Enabled the debug feature for multiple components.

  • Improved capture by value for better performance, corrected some related VLoad API usage and improved checks for global volatile clobbering.

  • Improved GenXBaling and GenXRegionCollapsing by reusing existing utilities and improving internal logic for cleaner handling of vector stores and memory dependencies.

  • Added support for JointMatrix column-major accumulator loads and stores with transposed loads when possible.

  • Updated warning message for VLA detection to be clearer, with additional documentation on flag usage.

  • Added warning message for non-uniform OpGroupBroadcast to help identify spec-breaking kernels.

  • Added block2d immediate offset emulation to support new functionality.

  • Updated Dot Product Accumulate Systolic (DPAS) latency for better instruction scheduling and Software Scoreboard tracking.

  • Added ARL and LNL functionality with support for related hardware platforms.

  • Disabled compaction for O0 compilation for debugging purposes and added a DumpASMToConsole key to enable dumping of assembly code.

  • Added a new internal 2D block read built-in for transpose u64 k4.

  • Enhanced error reporting in static asserts for implicit arguments structure to aid debugging.

  • Refined Intel Graphics Assembler (IGA) diagnostic messages for better clarity.

  • Created a new HW Debug pass to unify multiple diagnostic functions for easier debugging.

  • Optimized live range shortening using Live Elements analysis to prevent unnecessary extension of live ranges.

  • Added support for arithmetic operations for the last argument in mad instruction for broader use cases.

  • Updated GED model for XE2 and IGA encoding model to support new hardware features.

  • Moved SLM (Shared Local Memory) functionality into a new pass for better memory handling, adding features like NullProtection and offset handling.

  • Included LSC prefetch and introduced a new API to access 2D block write from OpenCL-C.

  • Expanded 2D block read transform capability.

  • Integrated a tile fence into system routines.

  • Enabled a clone address arithmetic pass during retry and RPE cutoff.

  • Introduced support for bindless mode for CONST_BASE and GLOBAL_BASE implicit arguments.

  • Added an option to set the maximum General Register File (GRF) number that vISA can select

  • Introduced support for the cl_intel_subgroup_2d_block_io extension to enhance block read operations for efficient matrix multiplication algorithms. This extension adds block I/O operations with cache control, including reading multiple blocks without data rearrangement, transforming and transposing data during reads, prefetching blocks, and writing blocks without rearranging.

  • Introduced support for the cl_intel_subgroup_buffer_prefetch extension to enable data prefetching from a buffer as a sub-group operation. This extension improves kernel performance by prefetching data into a fast cache, ensuring future reads occur from the cache instead of slower memory. The new block prefetch operations are supported in both the OpenCL C kernel programming language and the SPIR-V intermediate language.

  • Introduced support for the SPV_INTEL_cache_controls extension for cl_intel_subgroups*/cl_intel_subgroup_buffer_prefetch OpenCL extensions.

  • Introduced support for SPV_INTEL_cache_controls for OpenCL prefetch.

  • Introduced support for the cl_intel_subgroup_matrix_multiply_accumulate_tf32 extension that extends the existing cl_intel_subgroup_matrix_multiply_accumulate extension with support for TensorFloat(tfloat32) data type.

  • Introduced support for bindless image for media block read/write.

  • Introduced support for joint matrix load B row_major bfloat16 16x64.

  • Introduced support for API-specific values for SIMD8/16/32_SpillThreshold. Now, each API can have its own preferred setting for SIMD8/16/32_SpillThreshold.

  • Introduced support for the CacheControlLoadINTEL extension for OpenCL.

  • Introduced support for more variants of 2d block reads.

  • Introduced support for SPV_INTEL_subgroup_buffer_prefetch.

  • Added a new pass attempting to reduce number of times FP rounding mode is switched by moving and grouping together instructions using the same rounding mode.

  • Added the jump-threading-through-two-bbs option to control additional basic blocks construction when threading through two basic blocks.

  • Added a few convenience functions for generating system values in LVM3DBuilder.

  • Added the BMG support.

  • Added the missing sampler op conversions Level of Detail and Load (LOD & LD).

  • Added support for cache controls in the joint matrix.

  • Added check for dpas instructions in inline Virtual Instruction Set Architecture (VISA) for OpenCL.

  • Added the float type support for src2.

  • Added the scratch parameter to the sub_group sort built-ins.

  • Added the Vector Compute (VC) support for the Battlemage platform.

  • Added the 1x64x32 joint matrix support.

  • Added a non-default cache control field to addrSpace to make the address spaces with same numeric value yet different caching policies distinguishable.

  • Added DEV_ID_56C2.

  • Added a new field to RayDispatchGlobalData.

  • Added the ConvertHandleTo*ImageINTEL built-ins for the SYCL bindless image to support the SPIR-V extension’s SPV_INTEL_bindless_images.

  • Added a new simd flag to distinguish between explicit force and implicit hinted force.

  • Added a field to SBindlessProgram.

  • Added pseudo kill for address register spill fill code to fix the liveness issue.

  • Added debug SIP.

  • Added support for fp64 conversion emulation mode, where Intel® Graphics Compiler emulates only conversions.

  • Added Floating-Point Compare (fcmp) to supported instructions in the fp64 conversion emulation mode.

  • Added a warning message for FP64 emulation in cases of providing 2 options for FP64 conversion emulation.

  • Added additional flags for picking a kernel.

  • Increased the legal vector size to 64 to support the 2DBlockRead calls.

  • Introduced an initial support for SPV_KHR_cooperative_matrix to support for a new matrix format.

  • Introduced an improvement to rollback CodeLoopSinking when register pressure increased.

  • Improved the private memory Structure of Arrays (SOA) transpose by allowing array of struct to be transposed.

Intel® Graphics Driver Backports for Linux* OS (i915)

  • Introduced an experimental recovery mechanism for handling fatal GPU errors, designed to restore functionality without requiring a full system reboot or GPU reset. This helps reduce downtime and improves system reliability. This feature is disabled by default and can be enabled using the enable_fatal_error_recovery flag: a value of 1 routes fatal errors as Message Signaled Interrupts (MSI) and attempts a Secondary Bus Reset (SBR), while a value of 2 routes errors as MSI without attempting an SBR.

  • Enabled the dynamic ICS via the opt-in KLV feature.

  • Updated the Graphics Micro Controller (GuC) to version 70.44.1.

  • Extended 2M userptr support to 1G.

  • Enabled backport support for kernel version 6.13.

  • Added support for the HBM_REPLACE bit to signal High Bandwidth Memory (HBM) health status and its transition to the REPLACE state. This enhancement enables the driver to detect the bit and prevent loading when the state changes to REPLACE, while also reporting the issue and prompting HBM replacement.

  • Enabled group busyness counters in a VF.

  • Supported dumping multiple engines for offline debugging.

  • Supported 4K pages in lmem swapper.

  • Enhanced HBM training failure reporting.

  • Added extra debug info for GuC CT errors.

  • Added PCI ID for new PVC vector-only SKU.

  • Added write barriers between flat-ppgtt init and usage.

  • Showed multiCCS status in sysrq-G.

  • Showed pagefault address in canonical format.

  • Added jiffies for missing age parameter.

  • Tuned active defrag and idle buddy allocation.

  • Supported marking VM_BIND vmas as read-only.

  • Updated DG2 HuC to version 7.10.14.

  • Added the survivability lite feature for firmware updates on Flex.

  • Enhanced the offline installer for RHEL by including pre-built kernel modules for the Intel i915 graphics driver. These modules simplify installation and management of the driver. For further details, see the documentation provided with the offline installer. To access this documentation, run the offline installer with the -s parameter.

Intel® Graphics Memory Management Library

  • Improved handling of coherent and compressible resources.

  • Added support for new Battlemage device IDs.

  • Introduced the MOCS variable for Xe2.

  • Enabled GO:L3 for OpenCL usages.

  • Enabled IsCpuCacheable on Linux to improve performance.

  • Enabled the R10G10B10_XR_BIAS_A2_UNORM format for display to support 10-bit color and HDR rendering with improved visual quality.

  • Added the Media Video Processing (VP) performance tags that can help with optimization and debugging.

Intel® Graphics System Controller Firmware Update Library

  • Added timestamps to logs.

  • Implemented read firmware status register library API and read firmware status register in CLI.

  • Introduced a new error message to notify users about device iterator failures.

  • Enabled the logging of data traffic in the trace mode.

Intel® ME TEE Library

  • Added the GetTRC API.

  • Added an initialization API that accepts all parameters.

  • Added an option for setting a custom log callback.

  • Added a CMake preset for full debug builds.

  • Added a 32-bit release preset in CMake.

  • Added getters for maxMsgLen and protocolVer.

Intel® Media Driver for VAAPI

  • Added upstream Battlemage encoding support.

  • Added full support for the Lunar Lake platform in the upstream.

  • Introduced initial support for the Battlemage platform in the upstream.

  • Introduced upstream encoding support for Battlemage.

  • Added support for AV1 encoding with ARGB input.

Intel® Metrics Discovery Application Programming Interface

  • Added support for half-full Observability Architecture (OA) buffer interrupt in i915.

  • Added CoreFrequencyMHz details and MaxCount global symbols for Xe2.

  • Added support for GpuCoreClocks symbols in read equations.

  • Introduced a global symbol that indicates GPU frequency override state.

  • Added return code handling for the following functions: AddInformationSet, SetSnapshotReportReadEquation, SetSnapshotReportReadEquation, SetOverflowFunction, AddDefaultMetrics, AddStartRegisterSet, CreateMetricsFromPrototypes, and RefreshConfigRegisters

Intel® oneAPI Level Zero

  • Added support for registering a TeardownCallback to notify clients upon release of Level Zero resources.

  • Added support for sorting drivers based on provided devices.

  • Implemented basic leak checker in the validation layer.

  • Added zeImageViewCreateExt and zeMemFreeExt support to the leak checker.

  • Added API call logging to the validation layer.

  • Added the static Level Zero loader support.

  • Introduced support for 1.7 specification in the static loader.

  • Added event deadlock detection within the validation layer.

  • Started logging the full path of loaded libraries in traces for better debugging.

  • Added result passing to validation checkers at the epilogue stage.

  • Upgraded specification to version 1.12.15.

Intel® Video Processing Library

  • Introduced Intel® Video Processing Library API 2.15 support, including new property-based capability queries interface, extended decoder and encoder capabilities reporting, and definitions for VVC main 10 still picture profile and level 6.3.

  • Added the explicit INSTALL_EXAMPLES build option to control installation of example source code and content.

  • Updated the default Ubuntu build to version 24.04.

  • Introduced support for Intel® VPL API 2.14, introducing new quality and speed settings for AI-powered video frame interpolation. This update also expands algorithm and mode selection options for AI-based super resolution and adds support for High Efficiency Video Coding (HEVC) level 8.5 decoding.

  • Improved compatibility with Python 3.12 development environments.

Intel® Video Processing Library GPU Runtime

  • Improved AV1 decoding performance when all decode frame surfaces are in use.

  • Enabled property-based capability queries.

  • Enabled full support for the Lunar Lake platform.

  • Added initial support for the Battlemage platform.

  • Introduced support for the Y210 format in media copy operations.

  • Added a check for AV1 decoding bitdepth changes when parsing SPS syntax to prevent decoding issues.

  • Aligned the default decode frame rate to 30fps.

  • Added support for MFX_EXTBUFF_VIDEO_SIGNAL_INFO in AV1 decoding to retrieve video signal information.

  • Enabled dynamic decode frame rate by parsing frame rate data from the AV1 bitstream.

  • Improved reference frame patterns in pyramid cases.

  • Enabled block size selection for VP9 encoding segmentation.

Intel® Video Processing Library Tools

  • Introduced support for Intel® Media Transcode Accelerator.

  • Added new strings to the vpl-inspect tool to improve output readability.

  • Added the -props option to the vpl-inspect tool to support querying capabilities based on properties.

  • Updated the default Ubuntu build to version 24.04.

  • Integrated screen content coding tools for AV1 into sample_encode.

  • Added a new GTK renderer option to sample_decode and sample_multi_transcode.

  • Introduced a new -fullscreen option for GTK in sample_decode and sample_multi_transcode. Users can now toggle full screen using Ctrl+f and exit with Esc.

  • Enhanced support for Python 3.12 development environments.

Intel® XPU Manager and XPU System Management Interface

  • Added the GPU diagnostics with the copy engine for GPU memory, Peripheral Component Interconnect express (PCIe), and Xe link throughput.

  • Incorporated Manageability Engine Interface (MEI) error checking into the GPU diagnostics pre-check command.

  • Revised help information for the dump command.

  • Added the GPU performance data in the GPU diagnostics stress command.

  • Added the GPU temperature checking in GPU diagnostics.

  • Provided better GPU performance data when running GPU diagnostics on multiple GPUs.

  • Added the GPU firmware version checking in GPU diagnostics.

  • Added the Peripheral Component Interconnect express (PCIe) speed and width checking in GPU diagnostics.

  • Optimized the GPU diagnostics pre-check execution time.

  • Introduced support for the Flex AMC firmware update on the Lenovo SD530 V3 server.

  • Added the ability to display the date in the dump command when using the --date parameter.

  • Added security consolidation.

  • Upgraded the vGPU parameters.

  • Improved GPU diagnostics when Single Root I/O Virtualization (SR-IOV) is enabled.

  • Improved the GPU diagnostics configuration file.

  • Added the ability to display the version of the intel-i915-dkms package.

  • Improved GPU memory throughput reporting.

  • Improved Peripheral Component Interconnect Express (PCIe) downgrading checking.


2025-05-06

The 2350.150 release supports the following operating systems:

  • Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5

  • Ubuntu 22.04

  • SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6

Improvements

Intel® Graphics Driver Backports for Linux* OS (i915) and Intel GPU Firmware

Updated the Graphics Micro Controller (GuC) to version 70.44.1.


2025-04-02

The 2350.145 release supports the following operating systems:

  • Red Hat Enterprise Linux (RHEL): 8.8, 8.10, 9.2, 9.4, and 9.5

  • Ubuntu 22.04

  • SUSE Linux Enterprise (SLES): 15 SP4, 15 SP5, and 15 SP6

Features

General

Introduced support for RHEL 9.5.

Changes

General

Updated the signing key for KMD prebuilds to enhance security and ensure continued reliability. This key ensures that only trusted kernel-level software can run during the boot process. The new key, valid for one year, will be used to sign all new KMD module releases. If you use secure boot, you need to download and install a new Distinguished Encoding Rules (DER) certificate to maintain compatibility. If you do not use secure boot, no action is required.

Improvements

Intel® Graphics Driver Backports for Linux* OS (i915)

  • Updated the Graphics Micro Controller (GuC) to version 70.40.1.

  • Fixed an issue causing a memory error.

  • Introduced page fault handling improvements.

  • Implemented a workaround to address an encoder issue causing errors. The workaround adds support for the G8 power state in ATS-M to reduce idle power consumption.

  • Skipped the HuC microcontroller authentication register check and marked HuC as available if preloaded.

Intel® Graphics Compiler

  • Enabled optimizations for the -cl-opt-disable and -ze-opt-disable API options.

  • Fixed a crash issue that occurred during copy elimination.

  • Initialized address register to prevent unaligned cross-GRF access.

  • Replaced %opt with opt in the vector compiler.

Intel GPU Firmware

Updated the Graphics Micro Controller (GuC) to version 70.41.0.

Known issues

  • When installing drivers from Intel repositories on LTS, the repositories may contain older package versions than those in public repositories. To ensure the correct versions are installed from Intel repositories, we recommended adding priority=98 to the repository configuration. This helps effectively manage package version selection.

  • For application workloads using HBM on SPR, ensure that the UMD policy chooses I915_GEM_CREATE_MPOL_PREFERRED instead of i915_GEM_CREATE_MPOL_BIND when calling ioctl. This can be achieved by replacing --membind=8 with --preferred=8 in the numactl command.