Jetson NanoでIntel RealSenseを試してみる(4)

以前から開発を進めているピープルカウンタ[1]で, 人物の検出にYOLOv3[2]を試してみたいと思い, Jetson Nanoを購入した.
前回は, ctypesを利用してpythonでD415の出力をYOLOv3を使って物体検知する方法について紹介したが, 2FPS程度でしか動作しなかったので, 今度はkeras-yolo3で物体検知する方法について紹介する.

関連記事：
・Jetson NanoでIntel RealSenseを試してみる(1)
・Jetson NanoでIntel RealSenseを試してみる(2)
・Jetson NanoでIntel RealSenseを試してみる(3)
・Jetson NanoでIntel RealSenseを試してみる(4)

1. 開発環境の構築
1.1 Tensorflowのインストール[3][4]
公式ドキュメントに沿ってインストールしていたら, ハマってしまった.
ハマりどころ：
(1) pip3 install -U pipしてはいけない.
pip3 install -U pip してしまうと, その後"cannot import name ‘main’ "というエラーが出る.
注) もし, 誤って実行してしまった場合は, 以下コマンドでリカバーしましょう.

$ sudo python3 -m pip uninstall pip
$ sudo apt install python3-pip --reinstall

(2) 公式の方法でインストールできない.
公式の方法でインストールしようとすると, "requests.exceptions.HTTPError: 404 Client Error: Not found for url: ...."というエラーがでる.

sudo pip3 install --no-cache-dir --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu

1.2 Kerasのインストール[4]
以下の手順でkerasをインストールする.

$ sudo apt install libatlas-base-dev gfortran
$ pip3 install -U cython
$ pip3 install keras
$ python3
>>> import keras
>>> keras.__version__
'2.2.5'
>>>

2. D415 + keras-yolo3
2.1 keras-yolo3準備
以下の手順でkeras-yolo3をクローンする.

$ git clone https://github.com/qqwweee/keras-yolo3

2.2 モデル変換
YOLOv3の学習済モデルを利用するにあたり, モデルの重みを変換する必要がある.

$ cd keras_yolo3
$ python3 convert.py yolov3-tiny.cfg <YOLOv3ディレクトリ>/weights/yolov3-tiny.weights model_data/yolov3-tiny.h5
　　(省略)
Total params: 8,858,734
Trainable params: 8,852,366
Non-trainable params: 6,368
__________________________________________________________________________________________________
None
Saved Keras model to model_data/yolov3-tiny.h5
Read 8858734 of 8858734.0 from Darknet weights.

2.3 コード修正
(1) yolo_video.py
Depthカメラモードを追加するために, 3か所修正する.

3c3
< from yolo import YOLO, detect_video
---
> from yolo import YOLO, detect_video, detect_d415
50a51,55
> 
>     parser.add_argument(
>         '--d415', default=False, action="store_true",
>         help='D415 color detection mode, will ignore all positional arguments'
>     )
73a79,80
>     elif FLAGS.d415:
>         detect_d415(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output)

(2) yolo.py
D415の出力で物体検知を行うように, detect_video関数を参考に, detect_d415関数を追加する.

def detect_d415(yolo, video_path, output_path=""):
    import pyrealsense2 as rs
    import cv2

    WIDTH = 640
    HEIGHT = 480
    FPS = 15

    OUTPUT_VIDEO_FILE = 'outout.avi'
    isOutput = True
    if isOutput:
        fourcc = cv2.VideoWriter_fourcc(*'DIVX')    
        out = cv2.VideoWriter(OUTPUT_VIDEO_FILE, fourcc, FPS, (WIDTH, HEIGHT))
    accum_time = 0
    curr_fps = 0
    fps = "FPS: ??"

    # Depthカメラの初期設定
    pipeline = rs.pipeline()
    config = rs.config()
    config.enable_stream(rs.stream.color, WIDTH, HEIGHT, rs.format.bgr8, FPS)
    # ストリーミング開始
    profile = pipeline.start(config)

    prev_time = timer()
    while True:
        # フレーム待ち
        frames = pipeline.wait_for_frames()
        color_frame = frames.get_color_frame()
        if not color_frame:
            continue

        # YOLOv3で物体検出
        image = np.asanyarray(color_frame.get_data())
        image = Image.fromarray(image)
        image = yolo.detect_image(image)
        result = np.asarray(image)
        curr_time = timer()
        exec_time = curr_time - prev_time
        prev_time = curr_time
        accum_time = accum_time + exec_time
        curr_fps = curr_fps + 1
        if accum_time > 1:
            accum_time = accum_time - 1
            fps = "FPS: " + str(curr_fps)
            curr_fps = 0
        cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                    fontScale=0.50, color=(255, 0, 0), thickness=2)
        cv2.namedWindow("result", cv2.WINDOW_NORMAL)
        cv2.imshow("result", result)
        if isOutput:
            out.write(result)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    yolo.close_session()

また, モデルやアンカー情報の指定を変更する.

23,24c23,24
<         "model_path": 'model_data/yolo.h5',
<         "anchors_path": 'model_data/yolo_anchors.txt',
---
>         "model_path": 'model_data/yolov3-tiny.h5',
>         "anchors_path": 'model_data/tiny_yolo_anchors.txt',

2.4 動作確認
こちらも一応動作したが, フレーム毎の処理時間は250-370msかかっており, 3FPS程度しかでない.

$ python3 yolo_video.py --d415

f:id:moonlight-aska:20190903231430p:plain

当初の想定よりは処理がだいぶ遅い.
実行時のログを見ていると, 少し気になるメッセージも...

2019-09-03 23:08:42.128125: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:525] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)

2019-09-03 23:08:53.888316: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

どうも, GPUのメモリにデータがのりきらないようだ...

ピープルカウンタでは, 今のところグレースケールで処理し, 検出対象は人物だけなので, モデルサイズはもっと小さくできると思うので, 早めにDepth画像で人物検出をやってみよう.
あと, Jetson Nanoはデフォルトだと5W, 2コアモードだとか.
最大パワー10W, 4コアモードにすれば少なくとも1.5倍くらいは高速になるはず.
こちらも, ちゃんと調べて試してみよう.

---
参照URL:
[1] ピープルカウンタを考えてみる(1) ～ (5)
[2] YOLO: Real-Time Object Detection
[3] TensorFlow For Jetson Platform || Deep Learning Frameworks Documentation
[4] Jetson Nanoでディープラーニング