Yes you should definitely normalize the data. Consider the following example:
from fancyimpute import SoftImpute
import numpy as np
v=np.random.normal(100,0.5,(5,3))
v[2,1:3]=np.nan
v[0,0]=np.nan
v[3,0]=np.nan
SoftImpute().complete(v)
The result is
array([[ 81.78428587, 99.69638878, 100.67626769],
[ 99.82026281, 100.09077899, 99.50273223],
[ 99.70946085, 70.98619873, 69.57668189],
[ 81.82898539, 99.66269922, 100.95263318],
[ 99.14285815, 100.10809651, 99.73870089]])
Note that the places where I put nan
are completely off. However, if instead you run
from fancyimpute import SoftImpute
import numpy as np
v=np.random.normal(0,1,(5,3))
v[2,1:3]=np.nan
v[0,0]=np.nan
v[3,0]=np.nan
SoftImpute().complete(v)
(same code as before, the only difference is that v
is normalized) you get the following reasonable result:
array([[ 0.07705556, -0.53449412, -0.20081351],
[ 0.9709198 , -1.19890962, -0.25176222],
[ 0.41839224, -0.11786451, 0.03231515],
[ 0.21374759, -0.66986997, 0.78565414],
[ 0.30004524, 1.28055845, 0.58625942]])
Thus, when you are using SoftImpute
, don't forget to normalize your data (you can do that by making the mean of every column to be 0, and the std to be 1).