昨年友人より, 某コワーキングスペースの混雑度を見える化したいので, 機械学習部分を手伝ってほしいとの依頼があった.
そこで, 混雑度を測るために, 各スペースの人物検出を行い, 定員に対してどの程度の人がいるか検知することにした.
関連記事:
・コワーキングスペースの「混雑度」を検出する(1)
・コワーキングスペースの「混雑度」を検出する(2)
・コワーキングスペースの「混雑度」を検出する(3)
・コワーキングスペースの「混雑度」を検出する(4)
最初, 手元にあったJetson Nano 4GB版で人物検出を動かして, そのイメージを使ってJetson Nano 2GBを動かそうとしたがダメだった.
Jetson Nano 2GB版のSD Imageは4GB版とは異なっており, 以下はJetson Nano 2GB版で人物検出を動かした際のメモ.
1. 環境構築
1.1 SDイメージ書き込み[1]
ここを参照して, SDカードイメージを書き込めば問題ない.
(注1) Jetson Nano 4GB版と2GB版では, それぞれベースとなるSDイメージが異なるので注意.
(注2) SDカードは, SDXC UHS-1 U3 V30 A2あたりのスピードの速いものを使用すべし.
1.2 Jetson Toolsのインストール[2]
まずは, Jetson Nanoの負荷状況(CPU/GPU/Memory, etc)を確認するために, jetson-statsをインストールした.
jtopの使い方はここを参照のこと.
user@user-desktop:~$ sudo apt-get install python3-pip user@user-desktop:~$ sudo -H pip3 install jetson-stats
1.3 TensorFlow/Kerasのインストール[3]
TensorFlowは, 基本的には ここを参照してインストールすればよい.
ただ, 今回は, 最新のTensorFlowではなく, "1.15.x"が入れたかったので, 以下のように指定した.
user@user-desktop:~$ sudo pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==1.15.4
(注) 2021/1時点では2GB版のSDイメージ内のJetPackのバージョンは4.4.1
また, 今回はssd_kerasを使用するので, TensorFlow 1.15.xに対応しているKerasは"2.2.4"をインストールした.
user@user-desktop:~$ sudo pip3 install keras==2.2.4
1.4 GUI環境の無効
メモリが2GBしかないので, GUI環境を無効にして, CUI環境にする.
user@user-desktop:~$ systemctl get-default graphical.target user@user-desktop:~$ sudo systemctl set-default multi-user.target Removed /etc/systemd/system/default.target. Created symlink /etc/systemd/system/default.target → /lib/systemd/system/multi-user.target. user@user-desktop:~$ sudo reboot
[Before]
user@user-desktop:~$ free -h
total used free shared buff/cache available
Mem: 1.9G 417M 1.1G 27M 438M 1.4G
Swap: 7.0G 0B 7.0G[After]
user@user-desktop:~$ free -h
total used free shared buff/cache available
Mem: 1.9G 225M 1.4G 18M 326M 1.6G
Swap: 7.0G 0B 7.0G
2. 人物検出の動作確認
2. 1 まずは動かしてみる
先の人物検出プログラムのOpenCVにるGUI表示部分をコメントアウトして, GPUを使用するのでGPUオプションを追加して動作させてみる.
[GPUオプション]
import tensorflow as tf from keras.backend import tensorflow_backend config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config) tensorflow_backend.set_session(session)
しかし, 下記のようなエラーが....
(省略)
2021-01-15 23:20:25.295710: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0xf00e10000 next 18446744073709551615 of size 4194304
2021-01-15 23:20:25.295743: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 4198400
2021-01-15 23:20:25.295778: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0xf01210000 next 18446744073709551615 of size 4198400
2021-01-15 23:20:25.295812: I tensorflow/core/common_runtime/bfc_allocator.cc:914] Summary of in-use Chunks by size:
2021-01-15 23:20:25.295852: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 15 Chunks of size 256 totalling 3.8KiB
2021-01-15 23:20:25.295894: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 512 totalling 512B
2021-01-15 23:20:25.295955: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1024 totalling 1.0KiB
2021-01-15 23:20:25.296003: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1280 totalling 1.2KiB
2021-01-15 23:20:25.296052: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 2048 totalling 4.0KiB
2021-01-15 23:20:25.296102: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 4096 totalling 4.0KiB
2021-01-15 23:20:25.296155: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 131072 totalling 256.0KiB
2021-01-15 23:20:25.296210: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 262144 totalling 256.0KiB
2021-01-15 23:20:25.296266: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 771840 totalling 753.8KiB
2021-01-15 23:20:25.296323: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 786176 totalling 767.8KiB
2021-01-15 23:20:25.296379: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1048576 totalling 1.00MiB
2021-01-15 23:20:25.296435: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 4194304 totalling 4.00MiB
2021-01-15 23:20:25.296490: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 4198400 totalling 4.00MiB
2021-01-15 23:20:25.296546: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 11.00MiB
2021-01-15 23:20:25.296598: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 11538432 memory_limit_: 11538432 available bytes: 0 curr_region_allocation_bytes_: 16777216
2021-01-15 23:20:25.296660: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 11538432
InUse: 11538432
MaxInUse: 11538432
NumAllocs: 29
MaxAllocSize: 4198400
2021-01-15 23:20:25.296721: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ********x****************xx*********************xxxxxxxxxxxxxxx**********************xxxxxxxxxxxxxxx
2021-01-15 23:20:25.296818: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at random_op.cc:76 : Resource exhausted: OOM when allocating tensor with shape[3,3,1024,12] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
conv9_2_mbox_priorbox_reshape[0][
__________________________________________________________________________________________________
predictions (Concatenate) (None, 8732, 14) 0 mbox_conf_softmax[0][0]
mbox_loc[0][0]
mbox_priorbox[0][0]
__________________________________________________________________________________________________
decoded_predictions (DecodeDete (None, <tf.Tensor 't 0 predictions[0][0]
==================================================================================================
Total params: 23,745,908
Trainable params: 23,745,908
Non-trainable params: 0
__________________________________________________________________________________________________
None
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node conv4_2/truncated_normal/TruncatedNormal}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "recog_nocv.py", line 97, in <module>
load_model()
File "recog_nocv.py", line 60, in load_model
model.load_weights(weights_path, by_name=True)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 1163, in load_weights
reshape=reshape)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 1154, in load_weights_from_hdf5_group_by_name
K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2470, in batch_set_value
get_session().run(assign_ops, feed_dict=feed_dict)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 206, in get_session
session.run(tf.variables_initializer(uninitialized_vars))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node conv4_2/truncated_normal/TruncatedNormal (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Original stack trace for 'conv4_2/truncated_normal/TruncatedNormal':
File "recog_nocv.py", line 97, in <module>
load_model()
File "recog_nocv.py", line 56, in load_model
nms_max_output_size=400)
File "./ssd_keras/models/keras_ssd300.py", line 288, in ssd_300
conv4_2 = Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv4_2')(conv4_1)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 431, in __call__
self.build(unpack_singleton(input_shapes))
File "/usr/local/lib/python3.6/dist-packages/keras/layers/convolutional.py", line 141, in build
constraint=self.kernel_constraint)
File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 249, in add_weight
weight = K.variable(initializer(shape),
File "/usr/local/lib/python3.6/dist-packages/keras/initializers.py", line 214, in __call__
dtype=dtype, seed=self.seed)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 4185, in truncated_normal
return tf.truncated_normal(shape, mean, stddev, dtype=dtype, seed=seed)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/random_ops.py", line 175, in truncated_normal
shape_tensor, dtype, seed=seed1, seed2=seed2)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_random_ops.py", line 1016, in truncated_normal
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()"ResourceExhaustedError"は, よく学習時にバッチサイズを大きくとったときに, GPUメモリが足りなくなって発生したことがある.
Jetson NanoはCPUとGPUが2GBを共有しているので, PC版のソフトをそのまま持っていったら学習時ではなくとも"ResourceExhaustedError"が発生してもおかしくはない.
2.2 "ResourceExhaustedError"対策[4]
Jetson nano関連のエラー対策を調べていたら, こんな記事[4]があり試してみた.
config = tf.ConfigProto() config.gpu_options.allow_growth = True # 2GB x 0.2 = 400MBをGPUに config.gpu_options.per_process_gpu_memory_fraction = 0.20 session = tf.Session(config=config) tensorflow_backend.set_session(session)
やたら, メモリ関連のメッセージは吐くが, 何とか動いたー.
といっても, 1枚の画像を処理するのに, 5分くらいかかっている.
それと, メモリ使用量を見ると, CPUが1.2GB, GPUが789MBとなっており, 0.20がどう効いたのかよくわからない.

2.3 per_process_gpu_memory_fractionを調査
per_process_gpu_memory_fractionの設定により, GPUに割り当てられるメモリが変わるのか少しだけ調べてみた.
| 設定値 | CPUメモリ(GB) | GPUメモリ(GB) |
| 0.1 | 1.4 | 0.563 |
| 0.2 | 1.2 | 0.789 |
| 0.3 | 1.0 | 0.966 |
| 0.4 | 0.811 | 1.2 |
| 0.5 | - | - |

設定値が0.1増すことにより, アプリが使用するGPUメモリが10%(200MB)程度増加する感じだ.
ただ, CUDA関連のライブラリが使用するGPUメモリは設定値とは関係ないようだ.
GPUメモリをできるだけ確保できるように設定値を0.3~0.4に設定するとしても, 現状のままでは使用できない.
Jetson Nano 4GB版だとそれほど問題なく動くのだが, せっかく2GB版触り始めたので, もう少し頑張ってみよー.
次回はモデルのコンパクト化などGPUメモリ使用の削減などを考えてみようと思う.
このあたりに詳しい人がいれば, ぜひアドバイスいただきたいものです.
----
参照URL:
[1] Getting Started with Jetson Nano 2GB Developer Kit
[2] Jetson nanoの負荷の状況を表示するコマンド – CPU/GPUの使用率を見る
[3] Installing TensorFlow For Jetson Platform
[4] Out of memory error from TensorFlow: any workaround for this, or do I just need a bigger boat?




