Last week, NVIDIA released an update for the vGPU driver and manager (also described on The Citrix Blog). Besides bug fixes, this release introduces 3 new vGPU profiles, 1 new profile for the GRID K2 card (K280Q) and 2 new profiles for the GRID K1 card (K160Q and K180Q).
|Card||Profile||Frame buffer||Virtual display heads||Max resolution per head||Max vGPUs per GPU||Max vGPUs per board|
As you can see, the K180Q and K280Q profiles will assign an entire GPU to a VM. This can be compared to GPU pass-through, but leveriging the functionality of vGPU.
Installing on XenServer
Previously, if you would install a new NVIDIA driver which has new vGPU profiles, you would need to install a XenServer hotfix to get those new profiles available. If you installed the previous update (which enabled the K120Q and K220Q profiles), you also had to install XenServer hotfix XS62ESP1004. This hotfix provided some improvements in the:
- Adds support for a secondary NVIDIA PCI ID database, shipped with NVIDIA’s host drivers, to enable support for new NVIDIA devices and vGPU types without requiring any further XenServer updates.
This means that if you have indeed installed this hotfix, you will not need to install a XenServer hotfix specifically for the 3 new vGPU profiles. There are 2 prerequisites (in terms of XenServer hotfixes) for installing this latest version:
If you’re up to date with your XenServer hotfixes, installation of the new version of the vGPU driver is a breeze. Just grab the driver from the NVIDIA site and extract the NVIDIA-GRID-vGPU-XenServer-6.2-340.57-341.08.zip file. Next step is to upload the NVIDIA-vgx-xenserver-6.2-340.57.i386.rpm file to your XenServer(s) using for example WinSCP. Now open a console connection to the XenServer using either a tool like Putty, or just use the Console tab in XenCenter.
Update existing NVIDIA driver
Upgrading an existing installation of the NVIDIA driver on XenServer is very easy. Just use the rpm -U command to upgrade:
[root@localhost ~]# rpm -Uv NVIDIA-vgx-xenserver-6.2-340.57.i386.rpm Preparing packages for installation... NVIDIA-vgx-xenserver-6.2-340.57
The recommendation from NVIDIA is to shutdown all VMs using a GPU. The machine does continue to work during the update, but since you need to reboot the XenServer itself, it’s better to gracefully shutdown the VMs. So after your VMs have been shutdown and you upgraded the NVIDIA driver, you can reboot your host.
[root@localhost ~]# xe host-disable [root@localhost ~]# xe host-reboot
Remove and install
If for some reason the upgrade command does not work, you can always remove and reinstall the NVIDIA driver. Removing is done by first finding the exact package name first, using the “rpm -qa” command:
[root@localhost ~]# rpm -qa | grep -i nvidia NVIDIA-vgx-xenserver-6.2-331.59.01
Note the “grep -i nvidia” part. This is used to filter the output of the command on the term “nvidia” in case-insensitive mode (that’s what the -i switch is for). Once you find the exact package name, you can use the “rpm -e” command to remove it:
[root@localhost ~]# rpm -ev NVIDIA-vgx-xenserver-6.2-331.59.01
If you execute the “rpm -qa | grep NVIDIA” command again, you will notice that the package has been removed. Next is to install the latest version of the driver using the “rpm -i” command:
[root@localhost ~]# rpm -iv NVIDIA-vgx-xenserver-6.2-340.57.i386.rpm Preparing packages for installation... NVIDIA-vgx-xenserver-6.2-340.57
And, ofcourse, reboot the XenServer:
[root@localhost ~]# xe host-disable [root@localhost ~]# xe host-reboot
Checking the driver version
Once the XenServer has been rebooted, you can check the installed driver version by executing the “nvidia-smi -a” command and looking for the “Driver Version” line:
[root@localhost ~]# nvidia-smi -a | more ==============NVSMI LOG============== Timestamp : Thu Nov 20 11:50:20 2014 Driver Version : 340.57 Attached GPUs : 2 GPU 0000:08:00.0 Product Name : GRID K2 Product Brand : Grid Display Mode : Disabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Disabled Accounting Mode Buffer Size : 128 Driver Model Current : N/A Pending : N/A Serial Number : 0324812056685 GPU UUID : GPU-f7a1bf56-40bd-f84b-ba31-34fd87cb85ff Minor Number : 0 VBIOS Version : 80.04.D4.00.09 MultiGPU Board : Yes --More--
In this case it says “Driver Version: 340.57” which is the version I indeed installed. So all is good! Next is to update your virtual machines.
Update the virtual machines
The driver installed on XenServer corresponds with a driver version which should be installed in the virtual machines. Once you update the driver on XenServer and reboot the VMs, the graphics driver will fail in the virtual machine with code 43:
The download package which contains the driver for XenServer, also contains a 32-bit and 64-bit driver for Windows. Copy over the correct executable to your virtual machines (or golden image) and execute it. It will ask where to store the installation files, just hit OK. Once you agree to the license agreement, you will have two options, Express or Custom.
Express is the recommended option according to the setup. But if you use the “Custom” option, you will have the option to do a “clean” installation. The downside of the “clean installation” is that it will remove all profiles and custom settings. The pro of using the clean installation option is that it will reinstall the complete driver, meaning that there will be no old driver files left on the system.
Install the driver, and when installation is done, reboot the VM. Once the virtual machine has been rebooted, the code 43 is gone:
If you’re using a random pool in XenDesktop, you will need to update your Machine Catalog at this point. If you’re planning to create a new catalog for the K280Q profile, for example, you won’t need to change anything in XenDesktop. Once the XenServer has been rebooted, you can create a new resource in the hosting configuration, but using one of the new vGPU profiles:
Error starting virtual machine
Once you’ve updated the NVIDIA driver on XenServer, it could happen that the virtual machine is unable to start. It shows “vgpu exited unexpectedly” in the Log tab of the virtual machine in XenCenter:
If you forgot to reboot your host, there will be a conflict in the configuration of the virtual machine and the driver on the host. Rebooting your XenServer host most likely will solve this issue. (keep an eye on my blog, I’m working on a troubleshooting for vGPU article)
Installing an updated version of the NVIDIA driver is a relatively painless exercise, it works perfectly out of the box. The only thing you need to keep in mind is that your host driver has to match the guest OS driver, so deploying the update will need some planning. The K180Q and K280Q may look like needless additions to the vGPU profile list, since it does the same as GPU pass-through, but a few benefits of using one of these 2 profiles agains pass-through are (also desribed on Rachel Berry’s blog post):
- It makes use of the same management as other vGPU profiles
- Possible future option to do XenMotion (not possible yet)
- Changing profiles on the VM won’t require reinstallation of the driver
I hope this post was usefull, if you have any questions or remarks, feel free to leave a comment or send me an email.