Upgrading the NVIDIA GRID vGPU driver on XenServer

Last week, NVIDIA released an update for the vGPU driver and manager (also described on The Citrix Blog). Besides bug fixes, this release introduces 3 new vGPU profiles, 1 new profile for the GRID K2 card (K280Q) and 2 new profiles for the GRID K1 card (K160Q and K180Q).

Card Profile Frame buffer Virtual display heads Max resolution per head Max vGPUs per GPU Max vGPUs per board
K1 K180Q 4096Mb 4 2560×1600 1 4
K1 K160Q 2048Mb 4 2560×1600 2 8
K2 K280Q 4096Mb 4 2560×1600 1 2

As you can see, the K180Q and K280Q profiles will assign an entire GPU to a VM. This can be compared to GPU pass-through, but leveriging the functionality of vGPU.

Installing on XenServer

Previously, if you would install a new NVIDIA driver which has new vGPU profiles, you would need to install a XenServer hotfix to get those new profiles available. If you installed the previous update (which enabled the K120Q and K220Q profiles), you also had to install XenServer hotfix XS62ESP1004. This hotfix provided some improvements in the:

  1. Adds support for a secondary NVIDIA PCI ID database, shipped with NVIDIA’s host drivers, to enable support for new NVIDIA devices and vGPU types without requiring any further XenServer updates.

This means that if you have indeed installed this hotfix, you will not need to install a XenServer hotfix specifically for the 3 new vGPU profiles. There are 2 prerequisites (in terms of XenServer hotfixes) for installing this latest version:

If you’re up to date with your XenServer hotfixes, installation of the new version of the vGPU driver is a breeze. Just grab the driver from the NVIDIA site and extract the NVIDIA-GRID-vGPU-XenServer-6.2-340.57-341.08.zip file. Next step is to upload the NVIDIA-vgx-xenserver-6.2-340.57.i386.rpm file to your XenServer(s) using for example WinSCP. Now open a console connection to the XenServer using either a tool like Putty, or just use the Console tab in XenCenter.

XenCenter Console Tab
XenCenter Console Tab

Update existing NVIDIA driver

Upgrading an existing installation of the NVIDIA driver on XenServer is very easy. Just use the rpm -U command to upgrade:

[root@localhost ~]# rpm -Uv NVIDIA-vgx-xenserver-6.2-340.57.i386.rpm 
Preparing packages for installation...
NVIDIA-vgx-xenserver-6.2-340.57

The recommendation from NVIDIA is to shutdown all VMs using a GPU. The machine does continue to work during the update, but since you need to reboot the XenServer itself, it’s better to gracefully shutdown the VMs. So after your VMs have been shutdown and you upgraded the NVIDIA driver, you can reboot your host.

[root@localhost ~]# xe host-disable
[root@localhost ~]# xe host-reboot

Remove and install

If for some reason the upgrade command does not work, you can always remove and reinstall the NVIDIA driver. Removing is done by first finding the exact package name first, using the “rpm -qa” command:

[root@localhost ~]# rpm -qa | grep -i nvidia
NVIDIA-vgx-xenserver-6.2-331.59.01

Note the “grep -i nvidia” part. This is used to filter the output of the command on the term “nvidia” in case-insensitive mode (that’s what the -i switch is for). Once you find the exact package name, you can use the “rpm -e” command to remove it:

[root@localhost ~]# rpm -ev NVIDIA-vgx-xenserver-6.2-331.59.01

If you execute the “rpm -qa | grep NVIDIA” command again, you will notice that the package has been removed. Next is to install the latest version of the driver using the “rpm -i” command:

[root@localhost ~]# rpm -iv NVIDIA-vgx-xenserver-6.2-340.57.i386.rpm 
Preparing packages for installation...
NVIDIA-vgx-xenserver-6.2-340.57

And, ofcourse, reboot the XenServer:

[root@localhost ~]# xe host-disable
[root@localhost ~]# xe host-reboot

Checking the driver version

Once the XenServer has been rebooted, you can check the installed driver version by executing the “nvidia-smi -a” command and looking for the “Driver Version” line:

[root@localhost ~]# nvidia-smi -a | more

==============NVSMI LOG==============

Timestamp : Thu Nov 20 11:50:20 2014
Driver Version : 340.57

Attached GPUs : 2
GPU 0000:08:00.0
 Product Name : GRID K2
 Product Brand : Grid
 Display Mode : Disabled
 Display Active : Disabled
 Persistence Mode : Enabled
 Accounting Mode : Disabled
 Accounting Mode Buffer Size : 128
 Driver Model
 Current : N/A
 Pending : N/A
 Serial Number : 0324812056685
 GPU UUID : GPU-f7a1bf56-40bd-f84b-ba31-34fd87cb85ff
 Minor Number : 0
 VBIOS Version : 80.04.D4.00.09
 MultiGPU Board : Yes
--More--

In this case it says “Driver Version: 340.57” which is the version I indeed installed. So all is good! Next is to update your virtual machines.

Update the virtual machines

The driver installed on XenServer corresponds with a driver version which should be installed in the virtual machines. Once you update the driver on XenServer and reboot the VMs, the graphics driver will fail in the virtual machine with code 43:

NVIDIA driver not working
NVIDIA driver not working

The download package which contains the driver for XenServer, also contains a 32-bit and 64-bit driver for Windows. Copy  over the correct executable to your virtual machines (or golden image) and execute it. It will ask where to store the installation files, just hit OK. Once you agree to the license agreement, you will have two options, Express or Custom.

NVIDIA driver installation choices
NVIDIA driver installation choices

Express is the recommended option according to the setup. But if you use the “Custom” option, you will have the option to do a “clean” installation. The downside of the “clean installation” is that it will remove all profiles and custom settings. The pro of using the clean installation option is that it will reinstall the complete driver, meaning that there will be no old driver files left on the system.

NVIDIA driver clean installation option
NVIDIA driver clean installation option

Install the driver, and when installation is done, reboot the VM. Once the virtual machine has been rebooted, the code 43 is gone:

NVIDIA driver fixed
NVIDIA driver fixed

XenDesktop

If you’re using a random pool in XenDesktop, you will need to update your Machine Catalog at this point. If you’re planning to create a new catalog for the K280Q profile, for example, you won’t need to change anything in XenDesktop. Once the XenServer has been rebooted, you can create a new resource in the hosting configuration, but using one of the new vGPU profiles:

XenDesktop 7.6 DDC vGPU profiles
XenDesktop 7.6 DDC vGPU profiles

Error starting virtual machine

Once you’ve updated the NVIDIA driver on XenServer, it could happen that the virtual machine is unable to start. It shows “vgpu exited unexpectedly” in the Log tab of the virtual machine in XenCenter:

vGPU error starting the virtual machine
vGPU error starting the virtual machine

If you forgot to reboot your host, there will be a conflict in the configuration of the virtual machine and the driver on the host. Rebooting your XenServer host most likely will solve this issue. (keep an eye on my blog, I’m working on a troubleshooting for vGPU article)

Conclusion

Installing an updated version of the NVIDIA driver is a relatively painless exercise, it works perfectly out of the box. The only thing you need to keep in mind is that your host driver has to match the guest OS driver, so deploying the update will need some planning. The K180Q and K280Q may look like needless additions to the vGPU profile list, since it does the same as GPU pass-through, but a few benefits of using one of these 2 profiles agains pass-through are (also desribed on Rachel Berry’s blog post):

  • It makes use of the same management as other vGPU profiles
  • Possible future option to do XenMotion (not possible yet)
  • Changing profiles on the VM won’t require reinstallation of the driver

I hope this post was usefull, if you have any questions or remarks, feel free to leave a comment or send me an email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Complete the following sum: * Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.