みらいテックラボ

この夏(2022年)に, 下記のようなスペックのWindows PCを導入した.

CPU : AMD Ryzen7 5800X
RAM : 16GB
OS : Windows 11 Pro
SSD + HDD : 500GB + 6TB
etc : 水冷クーラー

前回, WSL2によるUbuntu 20.04環境の構築およびCUDA, cuDNNの導入について記した.
mirai-tec.hatenablog.com

[Ubuntu環境]
OS : Ubuntu 20.04 on Windows
GPU : GTX 1060-6GB
NVIDIA Driver : 516. 94
CUDA : 11.7
cuDNN : 8.5

その後, PyTorch(v1.12)やTensorFlow(v2.10)の仮想環境をminiconda3で作成し試していたところ, PyTorchではGPUを認識しているのだが, TensorFlowではGPUが認識されず, CPUのみで動作していることが判明した.
結局, 原因はcuDNNのバージョン不一致だったのだが, 少し調べたことをまとめておく.

1. PyTorch[2]
PyTorchはGPUをどのように認識しているか, 以下の項目について確認してみた.

PyTorchでGPUが使用可能か : torch.cuda.is_available()
GPUデバイスの数 : torch.cuda.device_count()
デフォルトのGPU番号 : torch.cuda.current_device()
GPUの名称 : torch.cuda.get_device_name()
CUDA Compute Capability : torch.cuda.get_device_capability()

$ python
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.12.1+cu116'
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.current_device()
0
>>> torch.cuda.get_device_name()
'NVIDIA GeForce GTX 1060 6GB'
>>> torch.cuda.get_device_capability()
(6, 1)
>>>

PyTorchでは, GPU/CUDA情報を正しく取得できているようだ.

2. TensorFlow
次に, 問題のTensorFlowの方も確認していく.
・デバイス情報のリスト：tensorflow.python.client.device_lib.list_local_devices()

$ python
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-09-24 17:43:19.321941: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-24 17:43:19.662008: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-24 17:43:20.409488: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-09-24 17:43:20.409554: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-09-24 17:43:20.409571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
>>> tf.__version__
'2.10.0'
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2022-09-24 17:44:10.216362: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-24 17:44:10.390243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 17:44:10.450432: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-09-24 17:44:10.450470: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 3809127935636748211
xla_global_id: -1
]
>>>

確かに, CPUしか認識していないようだ.
そもそも, tensorflowをimportした時点で,

E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

って, エラーが出ている. ただ, これはcuBLAS factoryを登録しようとしたら, すでに登録済みだと言ってる感じ.　

ネットで調べてもTensorflow 2.9.2にダウグレードするように書いてある[3]くらいで, あまりエラーに触れているものはないようだ.
一旦, Tensorflowは2.9.2にダウングレードすることに.

あと, 調べている中で, TensorFlow 2.10のCUDA/cuDNNのバージョン(CUDA 11.2/cuDNN 8.1)とインストールしたCUDA/cuDNNのバージョン(CUDA 11.7/cuDNN 8.5)があっていないことに気づいた.
(以前は結構このあたりのバージョンを気にしていたのだが, 今回はすっかり忘れていたのだ.)
ちなみに, TensorFlowの各バージョンとCUDA/cuDNNバージョンの関係はこちらを参考に.

そこで, まずはcuDNNを8.5から8.1にダウングレードすることに.
もし, これだけでダメな場合はCUDAも11.2にする.

NVIDIAのcuDNN Archive[4]の「Download cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2」から「cuDNN Runtime Library for Ubuntu20.04 x86_64 (Deb)」をダウンロードし, インストールした.

$ python
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.9.2'
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2022-09-24 18:07:14.128639: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-24 18:07:14.251695: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 18:07:14.272856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 18:07:14.273250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 18:07:14.792393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 18:07:14.793182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 18:07:14.793213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2022-09-24 18:07:14.793569: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:07:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-09-24 18:07:14.793636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /device:GPU:0 with 4598 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:07:00.0, compute capability: 6.1
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17517294614262033770
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4821352448
locality {
  bus_id: 1
  links {
  }
}
incarnation: 2792029488055711110
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:07:00.0, compute capability: 6.1"
xla_global_id: 416903419
]
>>>

「Your kernel may have been built without NUMA support.」とは言われているが, とりあえずGPUを認識するようになった.

最後に, jupyter notebook上で, TensorFlowを使ってMNISTの学習を行ってみた.

$ nvidia-smi
Sat Sep 24 20:41:17 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 516.94       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:07:00.0  On |                  N/A |
| 44%   43C    P2    57W / 120W |   5819MiB /  6144MiB |     58%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        80      G   /Xwayland                       N/A      |
|    0   N/A  N/A      1463      G   /chrome                         N/A      |
|    0   N/A  N/A      1512      G   /chrome                         N/A      |
|    0   N/A  N/A      1756      C   /python3.9                      N/A      |
+-----------------------------------------------------------------------------+

GPUを使って, 問題なく学習していそう.

TensorFlowをインストールする場合には, CUDA/cuDNNのバージョンには注意しましょう!!

----
[1] WSL2を用いたUbuntu環境を構築 - みらいテックラボ
[2] PyTorchでGPU情報を確認（使用可能か、デバイス数など）| note.nkmk.me
[3] TensorFlow 2.10 causes trouble! #47・google-research/multinerf
[4] cuDNN Archive | NVIDIA Developer