pythonでのone-hot encoding（one of k encoding）

教師データをone-hot encoding（one of k encodingとも言う）する際に， sklearnを使いたくないときにどうするかのメモ．

変換するべき教師データが以下の様なものとする．

import numpy as np

num_classes = 10
t = np.random.randint(10, size=(10, 1))
print(t)

'''
[[4]
 [4]
 [3]
 [3]
 [4]
 [1]
 [9]
 [4]
 [7]
 [0]]
'''

ベーシックな方法

t_vec = t.reshape(-1)
t_oh = np.zeros((len(t_vec), num_classes)).astype(int)
t_oh[np.arange(len(t_vec)), t_vec] = 1

print(t_oh)

'''
[[0 0 0 1 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0]
 [1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 0 0]
 [1 0 0 0 0 0 0 0 0 0]]
'''

単位行列を使う方法

# one-hot encoding
targets = np.array(t).reshape(-1)
t_oh = np.eye(num_classes)[targets].astype("int")
print(t_oh)

'''
[[0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1]
 [0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 0 0]
 [1 0 0 0 0 0 0 0 0 0]]
'''