最近, YOLOv5, YOLOXやDetectron2などを少し触る機会があったので, まずはYOLOv5について少しまとめておく.
少し前に, YOLOv5推論編や学習編について記載したが, 今回はモデル変換編ということでONNX形式やTFLite形式へのモデル変換などについて記す.
関連記事:
- 最近の物体検知を試してみる(YOLOv5編①)
- 最近の物体検知を試してみる(YOLOv5編②)
- 最近の物体検知を試してみる(YOLOv5編③)
- 最近の物体検知を試してみる(YOLOX編①)
物体検知をRaspberry PiやJetson Nanoなどエッジコンピュータで動かす場合に, PCやクラウドで学習した高性能なモデルは, そのままではメモリや処理速度の観点で利用が難しい.
エッジコンピュータで物体検知を動かす場合,
・パラメータ数の少ない, コンパクトなモデルを使用する.
・モデルへの入力画像サイズを小さくする.
・重みの固定小数点化や量子化などを行う.
・アプリ用途に応じて, 検出対象物を絞る. (例; 人, 車, バイクのみ)
など, 省メモリ化や処理の高速化を考える必要がある.
今回は, yolov5のexport.pyを使ったONNX形式やTFLite形式へのモデル変換や, モデルへの入力サイズを小さくするなどし, reTerminal(Raspberry Pi CM4)でどの程度動作するか試してみた.

1. モデル変換[1]
yolov5には, export.pyというモデル変換ツールが付属しており, 各種モデル形式の変換を簡単に行うことができる.
Format | `export.py --include` | Model
--- | --- | ---
PyTorch | - | yolov5s.pt
TorchScript | `torchscript` | yolov5s.torchscript
ONNX | `onnx` | yolov5s.onnx
OpenVINO | `openvino` | yolov5s_openvino_model/
TensorRT | `engine` | yolov5s.engine
CoreML | `coreml` | yolov5s.mlmodel
TensorFlow SavedModel | `saved_model` | yolov5s_saved_model/
TensorFlow GraphDef | `pb` | yolov5s.pb
TensorFlow Lite | `tflite` | yolov5s.tflite
TensorFlow Edge TPU | `edgetpu` | yolov5s_edgetpu.tflite
TensorFlow.js | `tfjs` | yolov5s_web_model/
PaddlePaddle | `paddle` | yolov5s_paddle_model/
Requirements:
$ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime openvino-dev tensorflow-cpu # CPU
$ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime-gpu openvino-dev tensorflow # GPU
Usage:
$ python export.py --weights yolov5s.pt --include torchscript onnx openvino engine coreml tflite ...
Inference:
$ python detect.py --weights yolov5s.pt # PyTorch
yolov5s.torchscript # TorchScript
yolov5s.onnx # ONNX Runtime or OpenCV DNN with --dnn
yolov5s.xml # OpenVINO
yolov5s.engine # TensorRT
yolov5s.mlmodel # CoreML (macOS-only)
yolov5s_saved_model # TensorFlow SavedModel
yolov5s.pb # TensorFlow GraphDef
yolov5s.tflite # TensorFlow Lite
yolov5s_edgetpu.tflite # TensorFlow Edge TPU
yolov5s_paddle_model # PaddlePaddle
PC上でONNX形式やTFLite形式へのモデル変換を行い, 変換後のモデルをreTerminalにコピーし推論を行った.
変換元の学習済みモデルには, 一番コンパクトなyolov5n.ptを用いた.
reTerminalの環境 :
- torch-1.12.1 / torchvision-0.13.1
- tensorflow-2.10.0
- onnx-1.12.0 / onnxruntime-1.12.1
- yolov5 (GitHub[1]からcloneしてインストール)
1.1 モデル変換(PC)
(1) ONNX形式
(yolov5) aska@moonlight:~/work/yolov5$ python export.py --weights yolov5n.pt --img 640 --include onnx
export: data=data/coco128.yaml, weights=['yolov5n.pt'], imgsz=[640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5 v6.2-180-g82bec4c Python-3.8.12 torch-1.10.2+cu113 CPU
Fusing layers...
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs
PyTorch: starting from yolov5n.pt with output shape (1, 25200, 85) (3.9 MB)
ONNX: starting export with onnx 1.11.0...
ONNX: export success ✅ 1.3s, saved as yolov5n.onnx (7.5 MB)
Export complete (1.5s)
Results saved to /home/aska/work/yolov5
Detect: python detect.py --weights yolov5n.onnx
Validate: python val.py --weights yolov5n.onnx
PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5n.onnx')
Visualize: https://netron.app(2) TFLite形式
(yolov5) aska@moonlight:~/work/yolov5$ python export.py --weights yolov5n.pt --img 640 --include tflite
export: data=data/coco128.yaml, weights=['yolov5n.pt'], imgsz=[640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['tflite']
YOLOv5 v6.2-180-g82bec4c Python-3.8.12 torch-1.10.2+cu113 CPU
Fusing layers...
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs
PyTorch: starting from yolov5n.pt with output shape (1, 25200, 85) (3.9 MB)
2022-10-01 16:20:56.179939: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow SavedModel: starting export with tensorflow 2.10.0...
from n params module arguments
2022-10-01 16:20:57.167212: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0 -1 1 1760 models.common.Conv [3, 16, 6, 2, 2]
1 -1 1 4672 models.common.Conv [16, 32, 3, 2]
2 -1 1 4800 models.common.C3 [32, 32, 1]
3 -1 1 18560 models.common.Conv [32, 64, 3, 2]
4 -1 1 29184 models.common.C3 [64, 64, 2]
5 -1 1 73984 models.common.Conv [64, 128, 3, 2]
6 -1 1 156928 models.common.C3 [128, 128, 3]
7 -1 1 295424 models.common.Conv [128, 256, 3, 2]
8 -1 1 296448 models.common.C3 [256, 256, 1]
9 -1 1 164608 models.common.SPPF [256, 256, 5]
10 -1 1 33024 models.common.Conv [256, 128, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 90880 models.common.C3 [256, 128, 1, False]
14 -1 1 8320 models.common.Conv [128, 64, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 22912 models.common.C3 [128, 64, 1, False]
18 -1 1 36992 models.common.Conv [64, 64, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 74496 models.common.C3 [128, 128, 1, False]
21 -1 1 147712 models.common.Conv [128, 128, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 296448 models.common.C3 [256, 256, 1, False]
24 [17, 20, 23] 1 115005 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256], [640, 640]]
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(1, 640, 640, 3)] 0 []
tf_conv (TFConv) (1, 320, 320, 16) 1744 ['input_1[0][0]']
tf_conv_1 (TFConv) (1, 160, 160, 32) 4640 ['tf_conv[0][0]']
tfc3 (TFC3) (1, 160, 160, 32) 4704 ['tf_conv_1[0][0]']
tf_conv_7 (TFConv) (1, 80, 80, 64) 18496 ['tfc3[0][0]']
tfc3_1 (TFC3) (1, 80, 80, 64) 28928 ['tf_conv_7[0][0]']
tf_conv_15 (TFConv) (1, 40, 40, 128) 73856 ['tfc3_1[0][0]']
tfc3_2 (TFC3) (1, 40, 40, 128) 156288 ['tf_conv_15[0][0]']
tf_conv_25 (TFConv) (1, 20, 20, 256) 295168 ['tfc3_2[0][0]']
tfc3_3 (TFC3) (1, 20, 20, 256) 295680 ['tf_conv_25[0][0]']
tfsppf (TFSPPF) (1, 20, 20, 256) 164224 ['tfc3_3[0][0]']
tf_conv_33 (TFConv) (1, 20, 20, 128) 32896 ['tfsppf[0][0]']
tf_upsample (TFUpsample) (1, 40, 40, 128) 0 ['tf_conv_33[0][0]']
tf_concat (TFConcat) (1, 40, 40, 256) 0 ['tf_upsample[0][0]',
'tfc3_2[0][0]']
tfc3_4 (TFC3) (1, 40, 40, 128) 90496 ['tf_concat[0][0]']
tf_conv_39 (TFConv) (1, 40, 40, 64) 8256 ['tfc3_4[0][0]']
tf_upsample_1 (TFUpsample) (1, 80, 80, 64) 0 ['tf_conv_39[0][0]']
tf_concat_1 (TFConcat) (1, 80, 80, 128) 0 ['tf_upsample_1[0][0]',
'tfc3_1[0][0]']
tfc3_5 (TFC3) (1, 80, 80, 64) 22720 ['tf_concat_1[0][0]']
tf_conv_45 (TFConv) (1, 40, 40, 64) 36928 ['tfc3_5[0][0]']
tf_concat_2 (TFConcat) (1, 40, 40, 128) 0 ['tf_conv_45[0][0]',
'tf_conv_39[0][0]']
tfc3_6 (TFC3) (1, 40, 40, 128) 74112 ['tf_concat_2[0][0]']
tf_conv_51 (TFConv) (1, 20, 20, 128) 147584 ['tfc3_6[0][0]']
tf_concat_3 (TFConcat) (1, 20, 20, 256) 0 ['tf_conv_51[0][0]',
'tf_conv_33[0][0]']
tfc3_7 (TFC3) (1, 20, 20, 256) 295680 ['tf_concat_3[0][0]']
tf_detect (TFDetect) ((1, 25200, 85), 115005 ['tfc3_5[0][0]',
) 'tfc3_6[0][0]',
'tfc3_7[0][0]']
==================================================================================================
Total params: 1,867,405
Trainable params: 0
Non-trainable params: 1,867,405
__________________________________________________________________________________________________
2022-10-01 16:20:59.099566: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-10-01 16:20:59.099656: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
Assets written to: yolov5n_saved_model/assets
TensorFlow SavedModel: export success ✅ 4.1s, saved as yolov5n_saved_model (7.4 MB)
TensorFlow Lite: starting export with tensorflow 2.10.0...
Found untraced functions such as tf_conv_2_layer_call_fn, tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_4_layer_call_fn while saving (showing 5 of 268). These functions will not be directly callable after loading.
Assets written to: /tmp/tmp3a887r7n/assets
2022-10-01 16:21:25.634156: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-10-01 16:21:25.634201: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-10-01 16:21:25.634801: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /tmp/tmp3a887r7n
2022-10-01 16:21:25.665209: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-10-01 16:21:25.665259: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /tmp/tmp3a887r7n
2022-10-01 16:21:25.783391: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-10-01 16:21:25.809995: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-10-01 16:21:26.039735: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /tmp/tmp3a887r7n
2022-10-01 16:21:26.161768: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 526967 microseconds.
2022-10-01 16:21:26.560965: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2022-10-01 16:21:26.892092: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1989] Estimated count of arithmetic ops: 5.393 G ops, equivalently 2.697 G MACs
Estimated count of arithmetic ops: 5.393 G ops, equivalently 2.697 G MACs
TensorFlow Lite: export success ✅ 26.9s, saved as yolov5n-fp16.tflite (3.7 MB)
Export complete (31.3s)
Results saved to /home/aska/work/yolov5
Detect: python detect.py --weights yolov5n-fp16.tflite
Validate: python val.py --weights yolov5n-fp16.tflite
PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5n-fp16.tflite')
Visualize: https://netron.appTFLite形式のint8モデルに変換する場合には, "--int8"オプションを追加する.
1.2 動作確認(reTerminal)
yolov5n.onnx, yolov5n-fp16.tflite, yolov5n-int8.tflite, yolov5n.ptおよびyolov5s.ptの5種類について, 動作させてみた.
aska@mars:~/work/yolov5 $ python detect.py --weights yolov5n.onnx --img 640 --source data/images/bus.jpg
/home/aska/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
detect: weights=['yolov5n.onnx'], source=data/images/bus.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 v6.2-180-g82bec4c Python-3.9.2 torch-1.12.1 CPU
Loading yolov5n.onnx for ONNX Runtime inference...
image 1/1 /home/aska/work/yolov5/data/images/bus.jpg: 640x640 3 persons, 1 bus, 532.9ms
Speed: 10.2ms pre-process, 532.9ms inference, 6.5ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp他のモデルで推論する場合, "--weights"でそれぞれのモデルを指定する.
ちなみに, 各モデルでの5回の平均処理速度は以下の通り.
| モデル | 平均処理時間(ms) |
| yolov5n.onnx | 550.4 |
| yolov5n-fp16.tflite | 881.2 |
| yolov5n-int8.tflite | 638.9 |
| yolov5n.pt | 540.6 |
| yolov5s.pt | 954.5 |
int8のTFLiteが速いかと思っていたが, 以外にもPyTorchモデルのyolov5n.ptが最も処理時間が短かった.
yolov5n.ptでは, だいたい約460msで処理するのだが, 1回だけ約850msだったため, 平均値が高くなった.
ただ, 処理時間のばらつき幅が大きいのは気になる.
2. モデル入力サイズ変更
現状では, 1枚の画像の物体検出に500ms以上かかっており, まだかなり遅い.
そこで, モデルへの入力画像のサイズを小さくして, 演算量を減らすことを試してみた.
デフォルトではモデルへの入力画像のサイズは640x640なので, 1/4の320x320で処理速度を計測してみた.
まず, 入力画像サイズを変更したモデルの作成は, ONNX形式やTFLite形式にモデル変換する際に"--img 320"のオプションを付ける.
PyTorchのモデルは, 特に変換は不要のようである.
推論時は, "--img 320"のオプションを付けて実行する.
各モデルでの5回の平均処理速度は以下の通り.
| モデル | 平均処理時間(ms) |
| yolov5n.onnx | 149.9 |
| yolov5n-fp16.tflite | 220.3 |
| yolov5n-int8.tflite | 155.2 |
| yolov5n.pt | 158.5 |
| yolov5s.pt | 299.5 |
モデルへの入力画像サイズを320x320にした場合, ONNX形式がもっとも処理時間が短くなった.
これくらいであれば5-6フレーム/秒で処理できるので, 対象物の動きがあまり速くない場合には使えるレベルかな.
ちなみに, 今回の結果はRaspberry Pi CM4(reTerminal)の場合であり, Jetson Nanoの場合にはGPUを搭載していたり, NVIDIAのTensorRTも利用できるので違う結果になると思われる.
また, 対象物が限定されるなら, yolov5n.ptをベースにカスタムモデルを作成することで, 処理時間の高速化及び検出精度の向上が更にできるはず.
いろいろとアプリを考えてみよう.
---
参照URL:
[1] GitHub - ultralytics/yolov5
![ゼロからよくわかる! ラズベリー・パイで電子工作入門ガイド Raspberry Pi 4 Model B対応[改訂2版] ゼロからよくわかる! ラズベリー・パイで電子工作入門ガイド Raspberry Pi 4 Model B対応[改訂2版]](https://m.media-amazon.com/images/I/515KwweNxLL._SL500_.jpg)


