Frequent Error

Jun 6th, 2020 - Now

q2l

[toc]

Environment

Failed to initialize NVML: Driver/library version mismatch

  • 原因:驱动更新导致不匹配

  • 解决方案:

    • 卸载 kernel mode

    • sudo rmmod nvidia

    • 在卸载中可能会遇到相关进程被占用的报错,只需要卸载占用的进程即可

  • 实例:

    ```bash

    (base) [q2l@gpu4 ~]$ nvidia-smi

    Failed to initialize NVML: Driver/library version mismatch

    (base) [q2l@gpu4 ~]$ sudo lsof -n -w /dev/nvidia* //查看当前正在运行的Nvidia进程,发现并没有

    (base) [q2l@gpu4 ~]$ sudo rmmod nvidia

    rmmod: ERROR: Module nvidia is in use by: nvidia_modeset

    (base) [q2l@gpu4 ~]$ sudo rmmod nvidia_modeset

    rmmod: ERROR: Module nvidia_modeset is in use by: nvidia_drm

    (base) [q2l@gpu4 ~]$ sudo rmmod nvidia_drm

    (base) [q2l@gpu4 ~]$ sudo rmmod nvidia_modeset

    (base) [q2l@gpu4 ~]$ sudo rmmod nvidia

    (base) [q2l@gpu4 ~]$ nvidia-smi

    Fri Jun 5 09:05:02 2020

    +-----------------------------------------------------------------------------+

    | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |

    |-------------------------------+----------------------+----------------------+

    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

    |===============================+======================+======================|

    | 0 Tesla V100-PCIE... Off | 00000000:06:00.0 Off | 0 |

    | N/A 49C P0 40W / 250W | 0MiB / 32510MiB | 0% Default |

    +-------------------------------+----------------------+----------------------+

    | 1 Tesla V100-PCIE... Off | 00000000:84:00.0 Off | 0 |

    | N/A 46C P0 40W / 250W | 0MiB / 32510MiB | 0% Default |

    +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

```

Last updated