Frequent Error
Jun 6th, 2020 - Now
q2l
[toc]
Environment
Failed to initialize NVML: Driver/library version mismatch
原因:驱动更新导致不匹配
解决方案:
卸载 kernel mode
sudo rmmod nvidia
在卸载中可能会遇到相关进程被占用的报错,只需要卸载占用的进程即可
实例:
```bash
(base) [q2l@gpu4 ~]$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
(base) [q2l@gpu4 ~]$ sudo lsof -n -w /dev/nvidia* //查看当前正在运行的Nvidia进程,发现并没有
(base) [q2l@gpu4 ~]$ sudo rmmod nvidia
rmmod: ERROR: Module nvidia is in use by: nvidia_modeset
(base) [q2l@gpu4 ~]$ sudo rmmod nvidia_modeset
rmmod: ERROR: Module nvidia_modeset is in use by: nvidia_drm
(base) [q2l@gpu4 ~]$ sudo rmmod nvidia_drm
(base) [q2l@gpu4 ~]$ sudo rmmod nvidia_modeset
(base) [q2l@gpu4 ~]$ sudo rmmod nvidia
(base) [q2l@gpu4 ~]$ nvidia-smi
Fri Jun 5 09:05:02 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:06:00.0 Off | 0 |
| N/A 49C P0 40W / 250W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:84:00.0 Off | 0 |
| N/A 46C P0 40W / 250W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
```
Last updated