AIパワーで画像を作成してみる①

今年(2022年)の春以降, 「DALL·E 2」[1], 「Midjourney」[2], 「Stable Diffusion」[3]といった精度の高い画像生成AIが話題となっている.
また, 2022年9月1日には, 画像生成AI「Midjourney」の絵が米国の美術品評会で1位になったといったセンセーショナルな記事もあった.

www.itmedia.co.jp

そこで, コマンドラインからでなくUIから簡単に操作可能な「AUTOMATIC1111版Stable Diffusion web UI」[4]を試してみた.

1. インストール
Ubuntu 20.04に, 「AUTOMATIC1111版Stable Diffusion web UI」をインストールする手順を示す.

[動作環境]

CPU : Core i7-7700
RAM : 16GB
OS : Ubuntu 20.04 64bit
GPU : RTX 3060

1.1 仮想環境構築
Anacondaで仮想環境を作成する.

$ conda create -n stable-diffusion python=3.9
$ source activate stable-diffusion
$ pip install --upgrade pip
$ pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

1.2 学習済みモデルのダウンロード
「Stable Diffusion」の学習済みモデルをHugging Face[5]のここにアクセスしてダウンロードする. このとき, アカウントの作成が必要です.
次に, 「GFGAN」[6]の学習済みモデルをここからダウロードする.

1.3 「AUTOMATIC1111版Stable Diffusion web UI」のダウンロード&セットアップ
コードをcloneし, セットアップする. (2022/9/14時点)
注) 日々コード等が更新されているので, 手順が異なる場合があるかも...

$ git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
$ cd stable-diffusion-webui
$ pip install -r requirements.txt
$ mv ~/<DLフォルダ>/sd-v1-4.ckpt model.ckpt
$ mv ~/<DLフォルダ>/GFPGANv1.3.pth .
$ python launch.py
Python 3.9.13 (main, Aug 25 2022, 23:26:10) 
[GCC 11.2.0]
Commit hash: 6153d9d9e9d51708e8f96eb8aaecf168adfcf4b7
Installing requirements for Web UI
Launching Web UI with arguments: 
Loading model [7460a6fa] from /home/aska/DNN/stable-diffusion-webui/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

1.4 WEB UI起動
表示されたURL(http://127.0.0.1:7860)をブラウザで開く.

[WEB UI画面]

次回以降の立ち上げは, 以下でOK.

python webui.py Loading model [7460a6fa] from /home/aska/DNN/stable-diffusion-webui/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

2. 動作確認
呪文を入力して, 画像生成を試してみる.

① 呪文「Pink rabbit playing on the moon, painted in Picasso style」