緑茶思考ブログ

【CS231n】Understanding and Visualizing Convolutional Neural Networks

Stanford大の教材CS231nを使ってNNやCNNを学んでいる．

Visualizing what ConvNets learn

NNで学習した特徴量が解釈できないという批判に対し、
CNNを理解し、可視化するアプローチが提案されてきた。
本記事ではこれらを紹介していく

Visualizing the activations and first-layer weights

Layer activations

素直な方法として、順伝播の際のactivationを表示するというのがある
ReLUネットワークの場合
- はじめのうちは、かたまりが見える
- 学習が進行すると、バラバラになり、局所化していく

ほとんどの入力に対してzeroを示すactivation mapの場合はダメな兆候
- dead filters を示し、高learning rateであることも示す。

f:id:yusuke_ujitoko:20170121230248p:plain

(CS231nより引用)

Conv/FC Filters

重みを表示するというのがある
- よく学習しているネットワークでは、ノイズが少なく、スムーズな模様が見える
- ノイズが見えるときというのは、学習が不十分であるか、正規化強さが弱くて過学習に至っている状態を示す。

f:id:yusuke_ujitoko:20170121230321p:plain

(CS231nより引用)

Retrieving images that maximally activate a neuron

ニューロンを最も活性化するような画像を取り出してみてみる方法もある
- receptive field(受容野)でニューロンがなにを探しているかも理解することができる。
- その例が、Rich feature hierarchies for accurate object detection and semantic segmentation by Ross Girshick et al.にある。

f:id:yusuke_ujitoko:20170121230537p:plain

(CS231nより引用)

このアプローチの欠点
- ReLUニューロンがなにかを意味をしているわけではないこと

Embedding the codes with t-SNE

ConvNets は徐々に画像を変化させて、線形分類器によって分類可能なrepresentationとするとも解釈できる
- We can get a rough idea about the topology of this space by embedding images into two dimensions so that their low-dimensional representation has approximately equal distances than their high-dimensional representation

埋め込み方法は色々ある。
- t-SNE　はよく知られた手法

f:id:yusuke_ujitoko:20170121230621p:plain

(CS231nより引用)

Occluding parts of the image

ConvNetが画像を「犬」と分類したときに、以下の区別はわからない
- 本当に犬を見つけたのか、
- それとも背景のcontextual cuesをもとに判断したのか
これを調べる方法
- 画像の中の一部を隠して、その部分の位置を変数として、クラスの確率をプロットする
- Matthew Zeiler's Visualizing and Understanding Convolutional Networks

f:id:yusuke_ujitoko:20170121230655p:plain

(CS231nより引用)