1

I have an array with values between 0 - 255 and one missing (nan), its shape is (27, 36). I tried to impute the missing data using the Nipals algorithm. After searching I found that there is a PLS Regression().

For the PLS regression method, it takes two matrices or vectors X and Y for fitting and prediction:

from sklearn.cross_decomposition import PLSRegression
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
pls2 = PLSRegression(n_components=2)
pls2.fit(X, Y)

Y_pred = pls2.predict(X)

Now, there are two problems, the first one is that the two matrices/vectors shouldn't have NaN values (it will throw an error if one of them contains NaN), and the second one is that I have just one matrix!

So how to let PLS Regressor impute the missing data? or what is the appropriate algorithm should I follow to solve this problem (of course using the Nipals algorithm)? No problem if rpy2 is used.

0 Answers0