最近の物体検知を試してみる(YOLOv5編①)

最近, YOLOv5, YOLOXやDetectron2などを少し触る機会があったので, まずはYOLOv5について少しまとめておく.

1. インストール[1]
以前使っていたYOLOv3[1]では, Darknetと呼ばる機械学習フレームワークが使われていて, 使うにはソースコードのビルドから始める必要があった.
しかし, YOLOv5はPyTorchベースとなっており, 導入は至って簡単になっていた.

Python >= 3.7.0 & PyTorch >= 1.7 環境下で, 以下の手順でインストール完了!!

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt

2. 推論[1]
2.1 まずは動かす
yolov5にある detect.py を使うことで簡単に試すことができる.
detect.pyの主な使い方:

Usage - sources:
    $ python path/to/detect.py --weights yolov5s.pt --source 0              # webcam
                                                             img.jpg        # image
                                                             vid.mp4        # video
                                                             path/          # directory
                                                             path/*.jpg     # glob
                                                             'https://youtu.be/Zgi9g1ksQHc'  # YouTube
                                                             'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

Usage - formats:
    $ python path/to/detect.py --weights yolov5s.pt                 # PyTorch
                                         yolov5s.torchscript        # TorchScript
                                         yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                                         yolov5s.xml                # OpenVINO
                                         yolov5s.engine             # TensorRT
                                         yolov5s.mlmodel            # CoreML (MacOS-only)
                                         yolov5s_saved_model        # TensorFlow SavedModel
                                         yolov5s.pb                 # TensorFlow GraphDef
                                         yolov5s.tflite             # TensorFlow Lite
                                         yolov5s_edgetpu.tflite     # TensorFlow Edge TPU

実際に, data/images/bus.jpgに対して試してみる.

$ python detect.py --source data/images/bus.jpg
detect: weights=yolov5s.pt, source=data/images/bus.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-69-g7830e91 torch 1.10.2+cu113 CUDA:0 (NVIDIA GeForce RTX 3060, 12054MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.5 GFLOPs
image 1/1 /home/aska/DNN/YOLO/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, Done. (0.023s)
Speed: 0.3ms pre-process, 23.2ms inference, 6.2ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp

実行結果は, runs/detect/expの下に保存される.
[実行結果]

他にもオプションが多数あるので, --helpで確認すると...

$ python detect.py --help
usage: detect.py [-h] [--weights WEIGHTS [WEIGHTS ...]] [--source SOURCE] [--data DATA] [--imgsz IMGSZ [IMGSZ ...]]
                 [--conf-thres CONF_THRES] [--iou-thres IOU_THRES] [--max-det MAX_DET] [--device DEVICE] [--view-img] [--save-txt]
                 [--save-conf] [--save-crop] [--nosave] [--classes CLASSES [CLASSES ...]] [--agnostic-nms] [--augment] [--visualize]
                 [--update] [--project PROJECT] [--name NAME] [--exist-ok] [--line-thickness LINE_THICKNESS] [--hide-labels]
                 [--hide-conf] [--half] [--dnn]

optional arguments:
  -h, --help            show this help message and exit
  --weights WEIGHTS [WEIGHTS ...]
                        model path(s)
  --source SOURCE       file/dir/URL/glob, 0 for webcam
  --data DATA           (optional) dataset.yaml path
  --imgsz IMGSZ [IMGSZ ...], --img IMGSZ [IMGSZ ...], --img-size IMGSZ [IMGSZ ...]
                        inference size h,w
  --conf-thres CONF_THRES
                        confidence threshold
  --iou-thres IOU_THRES
                        NMS IoU threshold
  --max-det MAX_DET     maximum detections per image
  --device DEVICE       cuda device, i.e. 0 or 0,1,2,3 or cpu
  --view-img            show results
  --save-txt            save results to *.txt
  --save-conf           save confidences in --save-txt labels
  --save-crop           save cropped prediction boxes
  --nosave              do not save images/videos
  --classes CLASSES [CLASSES ...]
                        filter by class: --classes 0, or --classes 0 2 3
  --agnostic-nms        class-agnostic NMS
  --augment             augmented inference
  --visualize           visualize features
  --update              update all models
  --project PROJECT     save results to project/name
  --name NAME           save results to project/name
  --exist-ok            existing project/name ok, do not increment
  --line-thickness LINE_THICKNESS
                        bounding box thickness (pixels)
  --hide-labels         hide labels
  --hide-conf           hide confidences
  --half                use FP16 half-precision inference
  --dnn                 use OpenCV DNN for ONNX inference

2.2 プログラムに組み込む
プログラムからYOLOv5を使う際には, torch.hubを使うことで簡単に組み込みできるようになっていた.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

# Images
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list

# Inference
results = model(img)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.
results.save()

上記コードをtest_detect.pyとして実行すると, 以下のようになる.

$ python test_detect.py
Downloading: "https://github.com/ultralytics/yolov5/archive/master.zip" to /home/aska/.cache/torch/hub/master.zip
YOLOv5 🚀 2022-4-17 torch 1.10.2+cu113 CUDA:0 (NVIDIA GeForce RTX 3060, 12054MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.5 GFLOPs
Adding AutoShape... 
image 1/1: 720x1280 2 persons, 2 ties
Speed: 1174.3ms pre-process, 144.6ms inference, 193.4ms NMS per image at shape (1, 3, 384, 640)
Saved 1 image to runs/detect/exp2

[実行結果]

2.3 推論時に精度を引き上げるオプション
(1) TTA(Test-Time Augmentation)[2]
TTAとは, モデル学習時に行っているAugmentation(データに様々な加工を加えてデータ量を増やし精度を上げるテクニック)を推論時にも行い, 精度を引き上げるテクニックである.

detect.py

$ python detect.py --source data/images/bus.jpg --augment

Torch Hub

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

# Images
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list

# Inference
results = model(img, augment=True)

(2) モデルアンサンブル(Model Ensembling)[3]
アンサンブル法は, いくつかのモデルを組み合わせて推論を行い, 精度を引き上げるテクニックである.

detect.py

$ python detect.py --source data/images/bus.jpg --weights models/yolov5s.pt models/yolov5m.pt 
detect: weights=['models/yolov5s.pt', 'models/yolov5m.pt'], source=data/images/bus.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-69-g7830e91 torch 1.10.2+cu113 CUDA:0 (NVIDIA GeForce RTX 3060, 12054MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.5 GFLOPs
Fusing layers... 
YOLOv5m summary: 290 layers, 21172173 parameters, 0 gradients, 49.0 GFLOPs
Ensemble created with ['models/yolov5s.pt', 'models/yolov5m.pt']

image 1/1 /home/aska/DNN/YOLO/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, Done. (0.020s)
Speed: 0.3ms pre-process, 20.5ms inference, 0.9ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp3