みらいテックラボ

音声・画像認識や機械学習など, 週末プログラマである管理人が興味のある技術の紹介や実際にトライしてみた様子などメモしていく.

最近の物体検知を試してみる(YOLOv5編③)

最近, YOLOv5, YOLOXやDetectron2などを少し触る機会があったので, まずはYOLOv5について少しまとめておく.

少し前に, YOLOv5推論編や学習編について記載したが, 今回はモデル変換編ということでONNX形式やTFLite形式へのモデル変換などについて記す.


関連記事:


物体検知をRaspberry PiやJetson Nanoなどエッジコンピュータで動かす場合に, PCやクラウドで学習した高性能なモデルは, そのままではメモリや処理速度の観点で利用が難しい.

エッジコンピュータで物体検知を動かす場合,
・パラメータ数の少ない, コンパクトなモデルを使用する.
・モデルへの入力画像サイズを小さくする.
・重みの固定小数点化や量子化などを行う.
・アプリ用途に応じて, 検出対象物を絞る. (例; 人, 車, バイクのみ)
など, 省メモリ化や処理の高速化を考える必要がある.

今回は, yolov5のexport.pyを使ったONNX形式やTFLite形式へのモデル変換や, モデルへの入力サイズを小さくするなどし, reTerminal(Raspberry Pi CM4)でどの程度動作するか試してみた.



1. モデル変換[1]
yolov5には, export.pyというモデル変換ツールが付属しており, 各種モデル形式の変換を簡単に行うことができる.

Format                      | `export.py --include`         | Model
---                         | ---                           | ---
PyTorch                     | -                             | yolov5s.pt
TorchScript                 | `torchscript`                 | yolov5s.torchscript
ONNX                        | `onnx`                        | yolov5s.onnx
OpenVINO                    | `openvino`                    | yolov5s_openvino_model/
TensorRT                    | `engine`                      | yolov5s.engine
CoreML                      | `coreml`                      | yolov5s.mlmodel
TensorFlow SavedModel       | `saved_model`                 | yolov5s_saved_model/
TensorFlow GraphDef         | `pb`                          | yolov5s.pb
TensorFlow Lite             | `tflite`                      | yolov5s.tflite
TensorFlow Edge TPU         | `edgetpu`                     | yolov5s_edgetpu.tflite
TensorFlow.js               | `tfjs`                        | yolov5s_web_model/
PaddlePaddle                | `paddle`                      | yolov5s_paddle_model/

Requirements:
    $ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime openvino-dev tensorflow-cpu  # CPU
    $ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime-gpu openvino-dev tensorflow  # GPU

Usage:
    $ python export.py --weights yolov5s.pt --include torchscript onnx openvino engine coreml tflite ...

Inference:
    $ python detect.py --weights yolov5s.pt                 # PyTorch
                                 yolov5s.torchscript        # TorchScript
                                 yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                                 yolov5s.xml                # OpenVINO
                                 yolov5s.engine             # TensorRT
                                 yolov5s.mlmodel            # CoreML (macOS-only)
                                 yolov5s_saved_model        # TensorFlow SavedModel
                                 yolov5s.pb                 # TensorFlow GraphDef
                                 yolov5s.tflite             # TensorFlow Lite
                                 yolov5s_edgetpu.tflite     # TensorFlow Edge TPU
                                 yolov5s_paddle_model       # PaddlePaddle


PC上でONNX形式やTFLite形式へのモデル変換を行い, 変換後のモデルをreTerminalにコピーし推論を行った.
変換元の学習済みモデルには, 一番コンパクトなyolov5n.ptを用いた.

reTerminalの環境 :

  • torch-1.12.1 / torchvision-0.13.1
  • tensorflow-2.10.0
  • onnx-1.12.0 / onnxruntime-1.12.1
  • yolov5 (GitHub[1]からcloneしてインストール)


1.1 モデル変換(PC)
(1) ONNX形式

(yolov5) aska@moonlight:~/work/yolov5$ python export.py --weights yolov5n.pt --img 640 --include onnx
export: data=data/coco128.yaml, weights=['yolov5n.pt'], imgsz=[640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5  v6.2-180-g82bec4c Python-3.8.12 torch-1.10.2+cu113 CPU

Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs

PyTorch: starting from yolov5n.pt with output shape (1, 25200, 85) (3.9 MB)

ONNX: starting export with onnx 1.11.0...
ONNX: export success ✅ 1.3s, saved as yolov5n.onnx (7.5 MB)

Export complete (1.5s)
Results saved to /home/aska/work/yolov5
Detect:          python detect.py --weights yolov5n.onnx 
Validate:        python val.py --weights yolov5n.onnx 
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5n.onnx')  
Visualize:       https://netron.app

(2) TFLite形式

(yolov5) aska@moonlight:~/work/yolov5$ python export.py --weights yolov5n.pt --img 640 --include tflite
export: data=data/coco128.yaml, weights=['yolov5n.pt'], imgsz=[640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['tflite']
YOLOv5  v6.2-180-g82bec4c Python-3.8.12 torch-1.10.2+cu113 CPU

Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs

PyTorch: starting from yolov5n.pt with output shape (1, 25200, 85) (3.9 MB)
2022-10-01 16:20:56.179939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

TensorFlow SavedModel: starting export with tensorflow 2.10.0...

                 from  n    params  module                                  arguments                     
2022-10-01 16:20:57.167212: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
  2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  4                -1  1     29184  models.common.C3                        [64, 64, 2]                   
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  6                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 24      [17, 20, 23]  1    115005  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256], [640, 640]]
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(1, 640, 640, 3)]   0           []                               
                                                                                                  
 tf_conv (TFConv)               (1, 320, 320, 16)    1744        ['input_1[0][0]']                
                                                                                                  
 tf_conv_1 (TFConv)             (1, 160, 160, 32)    4640        ['tf_conv[0][0]']                
                                                                                                  
 tfc3 (TFC3)                    (1, 160, 160, 32)    4704        ['tf_conv_1[0][0]']              
                                                                                                  
 tf_conv_7 (TFConv)             (1, 80, 80, 64)      18496       ['tfc3[0][0]']                   
                                                                                                  
 tfc3_1 (TFC3)                  (1, 80, 80, 64)      28928       ['tf_conv_7[0][0]']              
                                                                                                  
 tf_conv_15 (TFConv)            (1, 40, 40, 128)     73856       ['tfc3_1[0][0]']                 
                                                                                                  
 tfc3_2 (TFC3)                  (1, 40, 40, 128)     156288      ['tf_conv_15[0][0]']             
                                                                                                  
 tf_conv_25 (TFConv)            (1, 20, 20, 256)     295168      ['tfc3_2[0][0]']                 
                                                                                                  
 tfc3_3 (TFC3)                  (1, 20, 20, 256)     295680      ['tf_conv_25[0][0]']             
                                                                                                  
 tfsppf (TFSPPF)                (1, 20, 20, 256)     164224      ['tfc3_3[0][0]']                 
                                                                                                  
 tf_conv_33 (TFConv)            (1, 20, 20, 128)     32896       ['tfsppf[0][0]']                 
                                                                                                  
 tf_upsample (TFUpsample)       (1, 40, 40, 128)     0           ['tf_conv_33[0][0]']             
                                                                                                  
 tf_concat (TFConcat)           (1, 40, 40, 256)     0           ['tf_upsample[0][0]',            
                                                                  'tfc3_2[0][0]']                 
                                                                                                  
 tfc3_4 (TFC3)                  (1, 40, 40, 128)     90496       ['tf_concat[0][0]']              
                                                                                                  
 tf_conv_39 (TFConv)            (1, 40, 40, 64)      8256        ['tfc3_4[0][0]']                 
                                                                                                  
 tf_upsample_1 (TFUpsample)     (1, 80, 80, 64)      0           ['tf_conv_39[0][0]']             
                                                                                                  
 tf_concat_1 (TFConcat)         (1, 80, 80, 128)     0           ['tf_upsample_1[0][0]',          
                                                                  'tfc3_1[0][0]']                 
                                                                                                  
 tfc3_5 (TFC3)                  (1, 80, 80, 64)      22720       ['tf_concat_1[0][0]']            
                                                                                                  
 tf_conv_45 (TFConv)            (1, 40, 40, 64)      36928       ['tfc3_5[0][0]']                 
                                                                                                  
 tf_concat_2 (TFConcat)         (1, 40, 40, 128)     0           ['tf_conv_45[0][0]',             
                                                                  'tf_conv_39[0][0]']             
                                                                                                  
 tfc3_6 (TFC3)                  (1, 40, 40, 128)     74112       ['tf_concat_2[0][0]']            
                                                                                                  
 tf_conv_51 (TFConv)            (1, 20, 20, 128)     147584      ['tfc3_6[0][0]']                 
                                                                                                  
 tf_concat_3 (TFConcat)         (1, 20, 20, 256)     0           ['tf_conv_51[0][0]',             
                                                                  'tf_conv_33[0][0]']             
                                                                                                  
 tfc3_7 (TFC3)                  (1, 20, 20, 256)     295680      ['tf_concat_3[0][0]']            
                                                                                                  
 tf_detect (TFDetect)           ((1, 25200, 85),     115005      ['tfc3_5[0][0]',                 
                                )                                 'tfc3_6[0][0]',                 
                                                                  'tfc3_7[0][0]']                 
                                                                                                  
==================================================================================================
Total params: 1,867,405
Trainable params: 0
Non-trainable params: 1,867,405
__________________________________________________________________________________________________
2022-10-01 16:20:59.099566: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-10-01 16:20:59.099656: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
Assets written to: yolov5n_saved_model/assets
TensorFlow SavedModel: export success ✅ 4.1s, saved as yolov5n_saved_model (7.4 MB)

TensorFlow Lite: starting export with tensorflow 2.10.0...
Found untraced functions such as tf_conv_2_layer_call_fn, tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_4_layer_call_fn while saving (showing 5 of 268). These functions will not be directly callable after loading.
Assets written to: /tmp/tmp3a887r7n/assets
2022-10-01 16:21:25.634156: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-10-01 16:21:25.634201: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-10-01 16:21:25.634801: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /tmp/tmp3a887r7n
2022-10-01 16:21:25.665209: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-10-01 16:21:25.665259: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /tmp/tmp3a887r7n
2022-10-01 16:21:25.783391: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-10-01 16:21:25.809995: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-10-01 16:21:26.039735: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /tmp/tmp3a887r7n
2022-10-01 16:21:26.161768: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 526967 microseconds.
2022-10-01 16:21:26.560965: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2022-10-01 16:21:26.892092: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1989] Estimated count of arithmetic ops: 5.393 G  ops, equivalently 2.697 G  MACs

Estimated count of arithmetic ops: 5.393 G  ops, equivalently 2.697 G  MACs
TensorFlow Lite: export success ✅ 26.9s, saved as yolov5n-fp16.tflite (3.7 MB)

Export complete (31.3s)
Results saved to /home/aska/work/yolov5
Detect:          python detect.py --weights yolov5n-fp16.tflite 
Validate:        python val.py --weights yolov5n-fp16.tflite 
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5n-fp16.tflite')  
Visualize:       https://netron.app

TFLite形式のint8モデルに変換する場合には, "--int8"オプションを追加する.


1.2 動作確認(reTerminal)
yolov5n.onnx, yolov5n-fp16.tflite, yolov5n-int8.tflite, yolov5n.ptおよびyolov5s.ptの5種類について, 動作させてみた.

aska@mars:~/work/yolov5 $ python detect.py --weights yolov5n.onnx --img 640 --source data/images/bus.jpg 
/home/aska/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 
  warn(f"Failed to load image Python extension: {e}")
detect: weights=['yolov5n.onnx'], source=data/images/bus.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v6.2-180-g82bec4c Python-3.9.2 torch-1.12.1 CPU

Loading yolov5n.onnx for ONNX Runtime inference...
image 1/1 /home/aska/work/yolov5/data/images/bus.jpg: 640x640 3 persons, 1 bus, 532.9ms
Speed: 10.2ms pre-process, 532.9ms inference, 6.5ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp

他のモデルで推論する場合, "--weights"でそれぞれのモデルを指定する.

ちなみに, 各モデルでの5回の平均処理速度は以下の通り.

モデル平均処理時間(ms)
yolov5n.onnx550.4
yolov5n-fp16.tflite881.2
yolov5n-int8.tflite638.9
yolov5n.pt540.6
yolov5s.pt954.5

int8のTFLiteが速いかと思っていたが, 以外にもPyTorchモデルのyolov5n.ptが最も処理時間が短かった.
yolov5n.ptでは, だいたい約460msで処理するのだが, 1回だけ約850msだったため, 平均値が高くなった.
ただ, 処理時間のばらつき幅が大きいのは気になる.


2. モデル入力サイズ変更
現状では, 1枚の画像の物体検出に500ms以上かかっており, まだかなり遅い.
そこで, モデルへの入力画像のサイズを小さくして, 演算量を減らすことを試してみた.
デフォルトではモデルへの入力画像のサイズは640x640なので, 1/4の320x320で処理速度を計測してみた.

まず, 入力画像サイズを変更したモデルの作成は, ONNX形式やTFLite形式にモデル変換する際に"--img 320"のオプションを付ける.
PyTorchのモデルは, 特に変換は不要のようである.
推論時は, "--img 320"のオプションを付けて実行する.

各モデルでの5回の平均処理速度は以下の通り.

モデル平均処理時間(ms)
yolov5n.onnx149.9
yolov5n-fp16.tflite220.3
yolov5n-int8.tflite155.2
yolov5n.pt158.5
yolov5s.pt299.5

モデルへの入力画像サイズを320x320にした場合, ONNX形式がもっとも処理時間が短くなった.
これくらいであれば5-6フレーム/秒で処理できるので, 対象物の動きがあまり速くない場合には使えるレベルかな.

ちなみに, 今回の結果はRaspberry Pi CM4(reTerminal)の場合であり, Jetson Nanoの場合にはGPUを搭載していたり, NVIDIAのTensorRTも利用できるので違う結果になると思われる.

また, 対象物が限定されるなら, yolov5n.ptをベースにカスタムモデルを作成することで, 処理時間の高速化及び検出精度の向上が更にできるはず.
いろいろとアプリを考えてみよう.

---
参照URL:
[1] GitHub - ultralytics/yolov5