Cucco’s Compute Hack

コンピュータ関係の記事を書いていきます。

Failed to create session. CUDA_ERROR_INVALID_DEVICE

古いGPUがついていると、こんなエラーになることがある。

$cd /usr/local/lib/python3.5/dist-packages/tensorflow/models/image/mnist
$/usr/local/lib/python3.5/dist-packages/tensorflow/models/image/mnist$ python3 convolutional.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7465
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.79GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x314aa00
E tensorflow/core/common_runtime/direct_session.cc:135] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Traceback (most recent call last):
  File "convolutional.py", line 339, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "convolutional.py", line 284, in main
    with tf.Session() as sess:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1186, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 551, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
 

こうやって実行すれば、GPUを0だけ使用して、実行する。

$CUDA_VISIBLE_DEVICES=0 python3 convolutional.py

以下でも可。

$export CUDA_VISIBLE_DEVICES=0
$python3 convolutional.py

以下の定数も必要な様子。

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda