UnityでBarracuda + YOLOv5を試してみる(1)

Unityで物体検出を試してみようと調査していたら, Barracudaを使うことでonnx形式のモデルを扱える[1]ことが分かった.
そこで, Barracuda + YOLOv5で物体検知を試すことにしたのだが, いくつか注意すべきポイントがあったので少しまとめておく.

UnityでBarracuda + YOLOv5を試してみる(1)
UnityでBarracuda + YOLOv5を試してみる(2)

[開発環境]

Unity Editor 2021.3.19f1
Barracuda 3.0.0
YOLOv5 v7.0
onnx 1.11.0

1. ONNXモデル準備
YOLOv5の学習済モデル"yolov5n.pt"をONNX形式の変換するとともに, モデルの入出力について調べる.

1.1 ONNX変換[2]
YOLOv5の学習済モデルをONNX形式に変換する方法は, YOLOv5のサイトの手順従えばよい.

$ python export.py --weights yolov5n.pt --imgsz 320 --include onnx

1.2 モデルの入出力調査[3]
”yolov5n.onnx"の入出力を調査したところ, 以下のようになっていた.

Model : yolov5n.onnx
    NodeArg(name='images', type='tensor(float)', shape=[1, 3, 320, 320])
    NodeArg(name='output0', type='tensor(float)', shape=[1, 6300, 85])

ところが, Barracudaを通じて読み込んだ際には, 以下のように変わっていたので注意が必要である.

barracuda model
    inputs[0].shape : [1, 1, 1, 1, 1, 320, 320, 3]
    outputs.shape : [1, 1, 85, 6300]

2. 前処理[3]
WebCamTextureからモデル入力であるTensorへの変換処理について説明する.

WebCamTextureのデータを320x320のTexture2Dに変換する.
左下原点を左上原点への変換を行うとともに, 正規化(0-255 -> 0-1.0)する.

前処理で2点注意することがある.
(1) 画像のresize処理
当初以下のコードでresizeを行っていたが, WebCamTextureの場合にうまく動作しなかった.
ファイルから画像を読み込んだ場合には正常に動作したのだが....
それとも, Unity初心者なので, 使い方が分かってないだけかも...

    private static Texture2D ResizedTexture(Texture2D texture, int width, int height)
    {
        var resizedTexture = new Texture2D(width, height);
        Graphics.ConvertTexture(texture, resizedTexture);
        resizedTexture.Apply();
        return resizedTexture;
    }

(2) 画像原点に注意
Texture2Dは左下原点なのだが, モデル入力のTensorは左上原点である.

[コード]

using Unity.Barracuda;
using UnityEngine;
using System.IO;

public class PreProcessor : MonoBehaviour
{

    public static Tensor PreProcImage(WebCamTexture webCamTexture, int imageSize)
    {
        // Texture2Dに変換
        var srcTexture = new Texture2D(webCamTexture.width, webCamTexture.height);
        srcTexture.SetPixels32(webCamTexture.GetPixels32());
        srcTexture.Apply();
        // 320x320にリサイズ
        var resizedTexture = ResizedTexture(srcTexture, imageSize, imageSize);
        // 原点変換&正規化
        var tensor = TransformAndNormalize(resizedTexture.GetPixels32(), imageSize, imageSize);
        DestroyImmediate(srcTexture);
        DestroyImmediate(resizedTexture);
        return tensor;
    }

    private static Texture2D ResizedTexture(Texture2D texture, int width, int height)
    {
        // RenderTextureに書き込む
        var rt = RenderTexture.GetTemporary(width, height);
        Graphics.Blit(texture, rt);
        // RenderTexgureから書き込む
        var preRt = RenderTexture.active;
        RenderTexture.active = rt;
        var resizedTexture = new Texture2D(width, height);
        resizedTexture.ReadPixels(new Rect(0, 0, width, height), 0, 0);
        resizedTexture.Apply();
        RenderTexture.active = preRt;
        RenderTexture.ReleaseTemporary(rt);
        return resizedTexture;
    }

    // yolov5 / detect.py
    //      im /= 255 # 0 - 255 to 0.0 - 1.0
    // 左下原点を左上原点への変換も含めてやる.
    private static Tensor TransformAndNormalize(Color32[] buffer, int width, int height)
    {
        var normBuffer = new float[width * height * 3];
        for (int i = 0, line = height - 1; i < height; i++, line--) 
        {
            for (int j = 0; j < width; j++) 
            {
                var rgb = buffer[i * width + j];
                normBuffer[(line * width + j) * 3 + 0] = (float)rgb.r / 255.0f;
                normBuffer[(line * width + j) * 3 + 1] = (float)rgb.g / 255.0f;
                normBuffer[(line * width + j) * 3 + 2] = (float)rgb.b / 255.0f;
            }
        }
        return new Tensor(1, height, width, 3, normBuffer);
    }
}