OpenPose[1]やHRNet(High Resolution Network)[2]などオープンソースの姿勢推定アルゴリズムのコードが公開されているが, 今回KAPAO(Keypoints and Poses as Objects)[3]という姿勢推定手法が, 処理が速く精度がよいというので試してみた.
試すにあたり, 少しトラブったところもあるので, メモを残しておく.
1. セットアップ[3]
基本的には, GitHubのSetupに基づいて行えばよいのだが, PC環境の問題か以下のようなエラーが発生した.
[PC環境]
(kapao) aska@moonlight:~/kapao$ python demos/image.py --bbox /home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/cuda/__init__.py:106: UserWarning: NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Using device: cuda:0 Traceback (most recent call last): File "demos/image.py", line 69, in <module> model = attempt_load(args.weights, map_location=device) File "/home/aska/kapao/models/experimental.py", line 96, in attempt_load model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval()) # FP32 model File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 692, in float return self._apply(lambda t: t.float() if t.is_floating_point() else t) File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply module._apply(fn) File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply module._apply(fn) File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply module._apply(fn) [Previous line repeated 1 more time] File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 552, in _apply param_applied = fn(param) File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 692, in <lambda> return self._apply(lambda t: t.float() if t.is_floating_point() else t) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
"The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70."ってことで, どうもPyTorchがPCのCUDAのバージョンとあっていないようだ.
そこで, インストールされているPyTorch(1.9.1)とTorchvision(0.10.1)をアンインストールし, PCのCUDAバージョンに対応したPyTorch(1.12.0+cu113)とTorchvision(0.13.0+cu113)をインストールした.
次は, こんなエラーが発生!!
(kapao) aska@moonlight:~/kapao$ python demos/image.py --bbox Using device: cuda:0 image 1/1 /home/aska/kapao/res/crowdpose_100024.jpg: Traceback (most recent call last): File "demos/image.py", line 80, in <module> out = model(img, augment=True, kp_flip=data['kp_flip'], scales=data['scales'], flips=data['flips'])[0] File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/aska/kapao/models/yolo.py", line 137, in forward return self.forward_augment(x, kp_flip, s=scales, f=flips) # augmented inference, None File "/home/aska/kapao/models/yolo.py", line 148, in forward_augment yi, train_out_i = self.forward_once(xi) # forward File "/home/aska/kapao/models/yolo.py", line 173, in forward_once x = m(x) # run File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/upsampling.py", line 154, in forward recompute_scale_factor=self.recompute_scale_factor) File "/home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'Upsample' object has no attribute 'recompute_scale_factor'
少し調べていたら, 以下のような対応[4]でいけそう.
対処方法:
~/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/nn/modulesディレクトリにあるupsampling.pyを修正する.
修正前:
def forward(self, input: Tensor) -> Tensor: return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners, recompute_scale_factor=self.recompute_scale_factor)
修正後:
def forward(self, input: Tensor) -> Tensor: return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
これで一応動作するようになった.
2. 動作確認
GitHubのInference Demosに沿って, まずは静止画で動作確認した.
(kapao) aska@moonlight:~/kapao$ python demos/image.py --bbox --pose --face --no-kp-dets Using device: cuda:0 /home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] image 1/1 /home/aska/kapao/res/crowdpose_100024.jpg:
[実行結果]
ファイル:crowdpose_100024_kapao_l_coco_bbox_pose_face.png
次に, 動画でも動作確認した.
(kapao) aska@moonlight:~/kapao$ python demos/video.py --face --gif Using device: cuda:0 /home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Running inference: 99%|████████████████████████████████████████████████████████████▋| 190/191 [00:03<00:00, 60.21it/s] Saving GIF... 191it [00:07, 27.19it/s]
[実行結果]
デモ動画でなく, YouTubeから最近話題の「きつねダンス」を試してみた.
[元動画:YouTube 【朗報】きつねダンス『ついに“アレ”がつきました!』 ]
3. WEB CAM対応
カメラ映像で試せるように, video.pyをベースに改良してみた.
[コード]
import sys from pathlib import Path FILE = Path(__file__).absolute() sys.path.append(FILE.parents[1].as_posix()) # add kapao/ to path import argparse import os.path as osp from utils.torch_utils import select_device, time_sync from utils.general import check_img_size from utils.datasets import LoadWebcam from models.experimental import attempt_load import torch import cv2 import yaml from tqdm import tqdm import imageio from val import run_nms, post_process_batch import numpy as np import gdown import csv def main(args): with open(args.data) as f: data = yaml.safe_load(f) # load data dict # add inference settings to data dict data['imgsz'] = args.imgsz data['conf_thres'] = args.conf_thres data['iou_thres'] = args.iou_thres data['use_kp_dets'] = not args.no_kp_dets data['conf_thres_kp'] = args.conf_thres_kp data['iou_thres_kp'] = args.iou_thres_kp data['conf_thres_kp_person'] = args.conf_thres_kp_person data['overwrite_tol'] = args.overwrite_tol data['scales'] = args.scales data['flips'] = [None if f == -1 else f for f in args.flips] data['count_fused'] = False device = select_device(args.device, batch_size=1) print('Using device: {}'.format(device)) model = attempt_load(args.weights, map_location=device) # load FP32 model half = args.half & (device.type != 'cpu') if half: # half precision only supported on CUDA model.half() stride = int(model.stride.max()) # model stride imgsz = check_img_size(args.imgsz, s=stride) # check image size dataset = LoadWebcam(pipe='0', img_size=imgsz, stride=stride) if device.type != 'cpu': model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) # run once cap = dataset.cap fps = cap.get(cv2.CAP_PROP_FPS) h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) print(f'CAM : {w}x{h}, {fps}') gif_frames = [] t0 = time_sync() for i, (path, img, im0, _) in enumerate(dataset): img = torch.from_numpy(img).to(device) img = img.half() if half else img.float() # uint8 to fp16/32 img = img / 255.0 # 0 - 255 to 0.0 - 1.0 if len(img.shape) == 3: img = img[None] # expand for batch dim out = model(img, augment=True, kp_flip=data['kp_flip'], scales=data['scales'], flips=data['flips'])[0] person_dets, kp_dets = run_nms(data, out) bboxes, poses, _, _, _ = post_process_batch(data, img, [], [[im0.shape[:2]]], person_dets, kp_dets) im0_copy = im0.copy() # DRAW POSES csv_row = [] for j, (bbox, pose) in enumerate(zip(bboxes, poses)): x1, y1, x2, y2 = bbox cv2.rectangle(im0_copy, (int(x1), int(y1)), (int(x2), int(y2)), args.color, thickness=1) if args.csv: for x, y, c in pose: csv_row.extend([x, y, c]) if args.face: for x, y, c in pose[data['kp_face']]: if not args.kp_obj or c: cv2.circle(im0_copy, (int(x), int(y)), args.kp_size, args.color, args.kp_thick) for seg in data['segments'].values(): if not args.kp_obj or (pose[seg[0], -1] and pose[seg[1], -1]): pt1 = (int(pose[seg[0], 0]), int(pose[seg[0], 1])) pt2 = (int(pose[seg[1], 0]), int(pose[seg[1], 1])) cv2.line(im0_copy, pt1, pt2, args.color, args.line_thick) im0 = cv2.addWeighted(im0, args.alpha, im0_copy, 1 - args.alpha, gamma=0) if i == 0: t = time_sync() - t0 else: t = time_sync() - t1 if not args.gif and args.fps_size: cv2.putText(im0, '{:.1f} FPS'.format(1 / t), (5 * args.fps_size, 25 * args.fps_size), cv2.FONT_HERSHEY_SIMPLEX, args.fps_size, (255, 255, 255), thickness=2 * args.fps_size) if args.gif: gif_img = cv2.cvtColor(cv2.resize(im0, dsize=tuple(args.gif_size)), cv2.COLOR_RGB2BGR) if args.fps_size: cv2.putText(gif_img, '{:.1f} FPS'.format(1 / t), (5 * args.fps_size, 25 * args.fps_size), cv2.FONT_HERSHEY_SIMPLEX, args.fps_size, (255, 255, 255), thickness=2 * args.fps_size) gif_frames.append(gif_img) else: cv2.imshow('', im0) cv2.waitKey(1) if args.csv: csv_writer.writerow(csv_row) t1 = time_sync() key = cv2.waitKey(1) if key == 27: break cv2.destroyAllWindows() cap.release() def options(): parser = argparse.ArgumentParser() # video options parser.add_argument('--color', type=int, nargs='+', default=[255, 255, 255], help='pose color') parser.add_argument('--face', action='store_true', help='plot face keypoints') parser.add_argument('--display', action='store_true', help='display inference results') parser.add_argument('--fps-size', type=int, default=1) parser.add_argument('--gif', action='store_true', help='create gif') parser.add_argument('--gif-size', type=int, nargs='+', default=[480, 270]) parser.add_argument('--kp-size', type=int, default=2, help='keypoint circle size') parser.add_argument('--kp-thick', type=int, default=2, help='keypoint circle thickness') parser.add_argument('--line-thick', type=int, default=3, help='line thickness') parser.add_argument('--alpha', type=float, default=0.4, help='pose alpha') parser.add_argument('--kp-obj', action='store_true', help='plot keypoint objects only') parser.add_argument('--csv', action='store_true', help='write results so csv file') # model options parser.add_argument('--data', type=str, default='data/coco-kp.yaml') parser.add_argument('--imgsz', type=int, default=1024) parser.add_argument('--weights', default='kapao_s_coco.pt') parser.add_argument('--device', default='', help='cuda device, i.e. 0 or cpu') parser.add_argument('--half', action='store_true') parser.add_argument('--conf-thres', type=float, default=0.5, help='confidence threshold') parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold') parser.add_argument('--no-kp-dets', action='store_true', help='do not use keypoint objects') parser.add_argument('--conf-thres-kp', type=float, default=0.5) parser.add_argument('--conf-thres-kp-person', type=float, default=0.2) parser.add_argument('--iou-thres-kp', type=float, default=0.45) parser.add_argument('--overwrite-tol', type=int, default=50) parser.add_argument('--scales', type=float, nargs='+', default=[1]) parser.add_argument('--flips', type=int, nargs='+', default=[-1]) args = parser.parse_args() return args if __name__ == '__main__': args = options() main(args)
しかし, 色々と試していると, 以下のようなエラーが発生する場合があるようだ.
同様のエラー発生については, この記事[5]のコメント欄でも触れられているが, 対策案については触れられていない.
(kapao) aska@moonlight:~/kapao$ python demo_poses.py Using device: cuda:0 /home/aska/anaconda3/envs/kapao/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] CAM : 640x480, 30.0 webcam 0: webcam 1: webcam 2: webcam 3: webcam 4: : Traceback (most recent call last): File "demo_poses.py", line 189, in <module> main(args) File "demo_poses.py", line 82, in main bboxes, poses, _, _, _ = post_process_batch(data, img, [], [[im0.shape[:2]]], person_dets, kp_dets) File "/home/aska/kapao/val.py", line 108, in post_process_batch kpd[:, :4] = scale_coords(imgs[si].shape[1:], kpd[:, :4], shape) RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation.
そこで原因を調査することに.
調べた結果, val.pyの108行目のscale_coordsの引数に渡される値に問題があるようで, 具体的にはkpdのSizeが”[1, 40]”の場合, つまり, キーポイントが1つしか検出できなかった場合にエラーが発生するようである.
webcam 69: torch.Size([3, 768, 1024]) torch.Size([5, 40]) webcam 70: torch.Size([3, 768, 1024]) torch.Size([4, 40]) webcam 71: torch.Size([3, 768, 1024]) torch.Size([1, 40]) Traceback (most recent call last): File "demo_poses.py", line 189, in <module> main(args) File "demo_poses.py", line 82, in main bboxes, poses, _, _, _ = post_process_batch(data, img, [], [[im0.shape[:2]]], person_dets, kp_dets) File "/home/aska/kapao/val.py", line 112, in post_process_batch kpd[:, :4] = scale_coords(imgs[si].shape[1:], kpd[:, :4], shape) RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation.
暫定対策として, val.pyを以下のように修正することで, 一応回避できることを確認した.
修正前:
if data['use_kp_dets'] and nkp: mask = scores > data['conf_thres_kp_person'] poses_mask = poses[mask] if len(poses_mask): ### DEBUG print(imgs[si].shape, kpd.shape) kpd[:, :4] = scale_coords(imgs[si].shape[1:], kpd[:, :4], shape) kpd = kpd[:, :6].cpu()
修正後:
if data['use_kp_dets'] and nkp: mask = scores > data['conf_thres_kp_person'] poses_mask = poses[mask] if len(poses_mask) and kpd.shape[0] > 1: ### DEBUG print(imgs[si].shape, kpd.shape) kpd[:, :4] = scale_coords(imgs[si].shape[1:], kpd[:, :4], shape) kpd = kpd[:, :6].cpu()
処理速度などについて, 他のアルゴリズムとの比較などはできていないが, かなりいい感じである.
このあと, いろいろと試してみようと思う.
----
参照URL:
[1] GitHub - CMU-Perceptual-Computing-Lab/openpose
[2] GitHub - HRNet/HRNet-Human-Pose-Estimation
[3] GitHub - wmcnally/kapao
[4] YOLOv5で物体検出して座標・幅・高さをCSV出力する_ Python
[5] Kapaoで、人物検出と姿勢推定を行う