教師データをone-hot encoding(one of k encodingとも言う)する際に, sklearnを使いたくないときにどうするかのメモ.
変換するべき教師データが以下の様なものとする.
import numpy as np num_classes = 10 t = np.random.randint(10, size=(10, 1)) print(t) ''' [[4] [4] [3] [3] [4] [1] [9] [4] [7] [0]] '''
ベーシックな方法
t_vec = t.reshape(-1) t_oh = np.zeros((len(t_vec), num_classes)).astype(int) t_oh[np.arange(len(t_vec)), t_vec] = 1 print(t_oh) ''' [[0 0 0 1 0 0 0 0 0 0] [0 0 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 1 0 0 0] [0 0 0 0 0 0 1 0 0 0] [0 0 0 0 0 1 0 0 0 0] [0 0 0 0 0 0 0 0 1 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] [1 0 0 0 0 0 0 0 0 0]] '''
単位行列を使う方法
# one-hot encoding targets = np.array(t).reshape(-1) t_oh = np.eye(num_classes)[targets].astype("int") print(t_oh) ''' [[0 0 0 0 1 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0] [0 0 0 1 0 0 0 0 0 0] [0 0 0 1 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 1] [0 0 0 0 1 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] [1 0 0 0 0 0 0 0 0 0]] '''