Verifying Installation

To verify whether the expected hardware is working with the i915 driver, check the display hardware connected to your system:

hwinfo --display

On SLES, if hwinfo is installed in /usr/sbin and not in the default user path, run it using the following command:

/usr/sbin/hwinfo --display
Example output for Intel® Data Center GPU Max 1550 (device ID 0x0BD5)
51: PCI 8c00.0: 0380 Display controller
  [Created at pci.386]
  Unique ID: JefI.QAjErpDk4H4
  Parent ID: juVd.xbjkZcxCQYD
  SysFS ID: /devices/pci0000:89/0000:89:02.0/0000:8a:00.0/0000:8b:01.0/0000:8c:00.0
  SysFS BusID: 0000:8c:00.0
  Hardware Class: graphics card
  Model: "Intel Display controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x0bd5
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x0000
  Revision: 0x2f
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0x23fe7e000000-0x23fe7fffffff (ro,non-prefetchable)
  Memory Range: 0x236000000000-0x237fffffffff (ro,non-prefetchable)
  IRQ: 138 (447 events)
  Module Alias: "pci:v00008086d00000BD5sv00008086sd00000000bc03sc80i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #26 (PCI bridge)

Diagnosing the installed GPU using the XPU manager

The Intel® XPU Manager (Intel® XPUM) tool helps with system administration, GPU monitoring, diagnostics, and configuration for Intel Data Center GPUs. You can use it in full-featured mode with a RESTful API as well as via the simplified XPU System Management Interface (XPU-SMI) tool. The following examples present commands that can help you get more information about your GPU installation.

Getting information about the available GPU
$ xpu-smi discovery
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Data Center GPU Flex 170                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-0000-6769-df256e271362                                           |
|           | PCI BDF Address: 0000:4d:00.0                                                        |
|           | DRM Device: /dev/dri/card1                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
Getting information about the available GPU, including installed driver and firmware versions
$ sudo xpu-smi discovery -d 0
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Type: GPU                                                                     |
|           | Device Name: Intel(R) Data Center GPU Flex 170                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-0000-6769-df256e271362                                           |
|           | Serial Number: LQAC13401787                                                          |
|           | Core Clock Rate: 2050 MHz                                                            |
|           | Stepping: C0                                                                         |
|           |                                                                                      |
|           | Driver Version: I915_23.4.15_PSB_230307.15                                           |
|           | Kernel Version: 5.15.0-47-generic                                                    |
|           | GFX Firmware Name: GFX                                                               |
|           | GFX Firmware Version: DG02_1.3267                                                    |
|           | GFX Firmware Status: normal                                                          |
|           | GFX Data Firmware Name: GFX_DATA                                                     |
|           | GFX Data Firmware Version: 0x46b                                                     |
|           | GFX PSC Firmware Name: GFX_PSCBIN                                                    |
|           | GFX PSC Firmware Version:                                                            |
|           | AMC Firmware Name: AMC                                                               |
|           | AMC Firmware Version:                                                                |
|           |                                                                                      |
|           | PCI BDF Address: 0000:4d:00.0                                                        |
|           | PCI Slot: J37 - Riser 1, Slot 1                                                      |
|           | PCIe Generation: 4                                                                   |
|           | PCIe Max Link Width: 16                                                              |
|           | OAM Socket ID:                                                                       |
|           |                                                                                      |
|           | Memory Physical Size: 14248.00 MiB                                                   |
|           | Max Mem Alloc Size: 4095.99 MiB                                                      |
|           | ECC State: enabled                                                                   |
|           | Number of Memory Channels: 2                                                         |
|           | Memory Bus Width: 128                                                                |
|           | Max Hardware Contexts: 65536                                                         |
|           | Max Command Queue Priority: 0                                                        |
|           |                                                                                      |
|           | Number of EUs: 512                                                                   |
|           | Number of Tiles: 1                                                                   |
|           | Number of Slices: 1                                                                  |
|           | Number of Sub Slices per Slice: 32                                                   |
|           | Number of Threads per EU: 8                                                          |
|           | Physical EU SIMD Width: 8                                                            |
|           | Number of Media Engines: 2                                                           |
|           | Number of Media Enhancement Engines: 2                                               |
|           |                                                                                      |
|           | Number of Xe Link ports:                                                             |
|           | Max Tx/Rx Speed per Xe Link port:                                                    |
|           | Number of Lanes per Xe Link port:                                                    |
+-----------+--------------------------------------------------------------------------------------+
Enabling GPU telemetry
$sudo xpu-smi stats -d 0
+-----------------------------+--------------------------------------------------------------------+
| Device ID                   | 0                                                                  |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%)         | 0                                                                  |
| EU Array Active (%)         |                                                                    |
| EU Array Stall (%)          |                                                                    |
| EU Array Idle (%)           |                                                                    |
|                             |                                                                    |
| Compute Engine Util (%)     | 0; Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0              |
| Render Engine Util (%)      | 0; Engine 0: 0                                                     |
| Media Engine Util (%)       | 0                                                                  |
| Decoder Engine Util (%)     | Engine 0: 0, Engine 1: 0                                           |
| Encoder Engine Util (%)     | Engine 0: 0, Engine 1: 0                                           |
| Copy Engine Util (%)        | 0; Engine 0: 0                                                     |
| Media EM Engine Util (%)    | Engine 0: 0, Engine 1: 0                                           |
| 3D Engine Util (%)          |                                                                    |
+-----------------------------+--------------------------------------------------------------------+
| Reset                       |                                                                    |
| Programming Errors          |                                                                    |
| Driver Errors               |                                                                    |
| Cache Errors Correctable    |                                                                    |
| Cache Errors Uncorrectable  |                                                                    |
| Mem Errors Correctable      |                                                                    |
| Mem Errors Uncorrectable    |                                                                    |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W)               | 44                                                                 |
| GPU Frequency (MHz)         | 2050                                                               |
| GPU Core Temperature (C)    | 40                                                                 |
| GPU Memory Temperature (C)  |                                                                    |
| GPU Memory Read (kB/s)      | 1346                                                               |
| GPU Memory Write (kB/s)     | 286                                                                |
| GPU Memory Bandwidth (%)    | 0                                                                  |
| GPU Memory Used (MiB)       | 26                                                                 |
| Xe Link Throughput (kB/s)   |                                                                    |
+-----------------------------+--------------------------------------------------------------------+

For more information on Intel® XPUM, see Intel® XPUM overview or XPU System Management Interface user guide.

Smoke testing the compute stack

Use the following command to smoke test the compute stack:

clinfo | head -n 5

Running the same command without head displays multiple pages of GPGPU compute capability summary.

Example output
Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE

Smoke testing the media stack

Use the following command to smoke test the media stack for the Data Center GPU Flex series:

vainfo

Intel® Data Center GPU Max Series does not include codec capabilities, so the expected output has minimal entry points. Intel® Data Center GPU Flex Series and client GPUs provide hardware codecs, so many entry points are expected from vainfo output. See the following examples for both GPU series.

Example output

Intel® Data Center GPU Max Series:

vainfo: VA-API version: 1.18 (libva 2.17.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.4 (12e141d)
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats

Intel® Data Center GPU Flex Series:

vainfo: VA-API version: 1.18 (libva 2.17.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.4 (12e141d)
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSliceLP
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSliceLP
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointEncSliceLP
      VAProfileVP9Profile1            : VAEntrypointVLD
      VAProfileVP9Profile1            : VAEntrypointEncSliceLP
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointEncSliceLP
      VAProfileVP9Profile3            : VAEntrypointVLD
      VAProfileVP9Profile3            : VAEntrypointEncSliceLP
      VAProfileHEVCMain12             : VAEntrypointVLD
      VAProfileHEVCMain422_10         : VAEntrypointVLD
      VAProfileHEVCMain422_12         : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointEncSliceLP
      VAProfileHEVCMain444_10         : VAEntrypointVLD
      VAProfileHEVCMain444_10         : VAEntrypointEncSliceLP
      VAProfileHEVCMain444_12         : VAEntrypointVLD
      VAProfileHEVCSccMain            : VAEntrypointVLD
      VAProfileHEVCSccMain            : VAEntrypointEncSliceLP
      VAProfileHEVCSccMain10          : VAEntrypointVLD
      VAProfileHEVCSccMain10          : VAEntrypointEncSliceLP
      VAProfileHEVCSccMain444         : VAEntrypointVLD
      VAProfileHEVCSccMain444         : VAEntrypointEncSliceLP
      VAProfileAV1Profile0            : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointEncSliceLP
      VAProfileHEVCSccMain444_10      : VAEntrypointVLD
      VAProfileHEVCSccMain444_10      : VAEntrypointEncSliceLP

Verifying the usage of the Virtual Special Engine Capability (VSEC) module

To access the full range of Intel® Data Center GPU Max Series telemetry features, you need to use the intel_vsec module instead of intel_pmt. The intel_vsec module supports Max telemetry features while intel_pmt focuses on CPU telemetry.

To check whether the VSEC change is needed, review the output of the xpu-smi discovery -d 0 command. If the serial number is unknown, there may be a VSEC issue for the device serial number. In that case, follow this procedure to check and modify the used kernel driver module.

  1. Use the following command to check whether the intel_vsec module loads and is associated with a PCI device.

    for d in 8086:09A7 8086:4F93 8086:4F95; do sudo lspci -k -d $d; done
    

    The correct output should like in the following example:

    05:00.0 Memory controller: Intel Corporation Device 09A7
    Kernel driver in use: intel-vsec
    Kernel modules: intel_vsec
    

    If intel_pmt is used as a kernel driver instead of intel-vsec, proceed to the next steps to change the kernel driver.

  2. Install the driverctl tool:

    sudo dnf install driverctl
    

    A driverctl package is not available for SUSE Linux Enterprise Server 15. Instead, install it from the driverctl repository.

    git clone https://gitlab.com/driverctl/driverctl.git
    cd driverctl
    sudo make install
    
    sudo apt install driverctl
    
  3. Check which device the intel-pmt module is linked to.

    sudo driverctl list-devices | grep -iE "pmt"
    

    The expected output is 0000:8e:00.1 intel-pmt, but you may see a different device address than 0000:8e:00.1.

  4. Override the default driver binding using the retrieved system’s device address.

    sudo driverctl set-override 0000:8e:00.1 "intel_vsec"
    

Verifying Integrated Firmware Image (IFWI)

Use the Intel® XPUM tool to flash IFWI onto a Flex or Max GPU.

  1. Check GFX firmware version for each GPU.

    sudo xpu-smi discovery -d 0
    sudo xpu-smi discovery -d 1
    
  2. Check the latest firmware version for your hardware from your Intel or OEM portal and compare it with the version currently installed on your device. If the latest firmware version is newer than the one on your device, install the new firmware.

    sudo xpu-smi updatefw -d 0 -t GFX -f
    /home/intel/ATS_M75_128_B0_PVT_ES_017_gfx_fwupdate_SOC2.bin -y
    
    sudo xpu-smi updatefw -d 0 -t GFX_PSCBIN -f /home/test/PVC_Tuscany_oam_cbb_otf_53G_220803.pscbin
    sudo xpu-smi updatefw -d 0 -t GFX -f /home/test/PVC.Fwupdate_Prod_2023.WW26.3_Tuscany_Pcie.bin
    
  3. Update firmware options.

    sudo xpu-smi updatefw
    Update GPU firmware
    
    Usage: xpu-smi updatefw [Options]
      xpu-smi updatefw -d [deviceId] -t GFX -f [imageFilePath]
      xpu-smi updatefw -d [pciBdfAddress] -t GFX -f [imageFilePath]
    
    Options:
      -h,--help                   Print this help message and exit
      -j,--json                   Print result in JSON format
    
      -d,--device                 The device ID or PCI BDF address
      -t,--type                   The firmware name. Valid options: GFX, GFX_DATA, GFX_CODE_DATA, GFX_PSCBIN, AMC. AMC firmware update just works on Intel M50CYP server (BMC firmware version is 2.82 or newer) and Supermicro SYS-620C-TN12R server (BMC firmware version is 11.01 or newer).
      -f,--file                   The firmware image file path on this server
      -u,--username               Username used to authenticate for host redfish access
      -p,--password               Password used to authenticate for host redfish access
      -y,--assumeyes              Assume that the answer to any question which would be asked is yes
      --force                     Force GFX firmware update. This parameter only works for GFX firmware.