PyTorchでSSDを試してみた(4) - みらいテックラボ

普段物体検知を行うとき, これまでYOLOv3[1]やssd_keras[2][3]を使ってきた.
しかし, これらの物体検知をJetson Nanoなど組み込みボードで処理しようとすると, フレーム毎の処理に結構時間がかかり, 秒数フレームほどしか処理できなかったりした.
JetsonでDNN処理の高速化を考えた場合, Pythonの実装を頑張って高速化を目指すより, NVIDIAのTensorRTを活用する方が容易そうだ.
そこで, PyTorchのSSD(Single Shot MultiBox Detector)モデルをPCで学習し, 学習済みモデルをONNX形式に変換してJetsonで活用することを試してみることにした.

関連記事：
・PyTorchでSSDを試してみた(1)
・PyTorchでSSDを試してみた(2)
・PyTorchでSSDを試してみた(3)
・PyTorchでSSDを試してみた(4)

f:id:moonlight-aska:20210914222550j:plain:w400

前回, Jetson Nanoでpytorch-ssdのSSDモデルやONNX形式へ変換したモデルの処理時間などを確認してみた.
今回は, 以前開発したピープルカウンタ[4]のYOLOv3の代わりにjetson-inferenceの物体検知を組み込んでみた.

1. PeopleCounterへの組み込み
基本的には以前開発したピープルカウンタのYOLOv3の部分をjetson-inferenceの物体検知(SSD)に置き換える.
Depthカメラの画像を物体検知に渡す部分のコードを記しておく.
[コード]

# network & model
network = 'ssd-mobilenet-v1'
argv = ["--model=models/ssd-mobilenet.onnx",
                "--labels=models/labels.txt",
                "--input-blob=input_0",
                "--output-cvg=scores",
                "--output-bbox=boxes" ]

# main function
def main(args):
    # variable
    pipeline = None

    # Preparation of pytorch-ssd
    net = jetson.inference.detectNet(network, argv, threshold=0.5)

    # set judgment zone
    zone = Zone(WIDTH, HEIGHT)
    tracker = Tracker(zone, logger)
    win = DrawWind(zone)

    # start depth camera
    pipeline, profile = start_depth_camera(WIDTH, HEIGHT, inpfile=args.input, outfile=args.output)
    clipping_distance = get_clipping_distance(profile, CLIPPING_METERS)
    try:
        frameno = 0
        while True:
            # get frame(Depth & Color)
            frames = pipeline.wait_for_frames()
            depth_frame = frames.get_depth_frame()
            if not depth_frame:
                continue

            # output tracking pid
            for p in tracker.persons:
                p.update_lostcount()
                logger.debug('frame[{}] : ID[{}] = {}'.format(frameno, p.id, p.num_lost))
            # depth data to clip image
            frame = convert_depth2gray(depth_frame.get_data(), clipping_distance)
            # detect persons using ssd model
            img = jetson.utils.cudaFromNumpy(frame.copy())
            detections = net.Detect(img)
            print('frame[{}] = {}'.format(frameno, len(detections)))
            # tracking persons
            tracker.tracking_persons(detections)
            frameno += 1

            frame = win.draw_persons(frame, tracker.persons)
            frame = win.draw_counter(frame, tracker.num_left, tracker.num_right)
            cv2.imshow('PeopleCounter v0.30', frame)
            # stop 
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

    except KeyboardInterrupt:  # Ctrl-C
        logger.debug('Ctrl+C!!')
    finally:
        logger.debug('Finally!!')
        stop_camera(pipeline)
        logger.info('{} : {}'.format(DrawWind.LEFT_MSG, tracker.num_left))
        logger.info('{} : {}'.format(DrawWind.RIGHT_MSG, tracker.num_right))
        cv2.destroyAllWindows()

def convert_depth2gray(data, clipping_distance):
    '''
    Convert depth data to gray image.
    '''
    depth_image = np.asanyarray(data)
    clip_image = np.where((depth_image > clipping_distance) | (depth_image <= 0), 0, 255. - (depth_image * 255. / clipping_distance))
    clip_graymap = clip_image.reshape((HEIGHT, WIDTH)).astype(np.uint8)
    gray_image = np.dstack((clip_graymap, clip_graymap, clip_graymap))
    return gray_image

[実行結果]
youtu.be