Just looking for some brief advice to put me back on the right track. I have been working on a solution to a problem where I have a very sparse input matrix (~25% of information filled, rest is 0's) stored in a sparse.coo_matrix:
sparse_matrix = sparse.coo_matrix((value, (rater, blurb))).toarray()
After some work on building this array from my data set and messing around with some other options, I currently have my NMF model fitter function defined as follows:
def nmf_model(matrix):
model = NMF(init='nndsvd', random_state=0)
W = model.fit_transform(matrix);
H = model.components_;
result = np.dot(W,H)
return result
Now, the issue is my output doesn't seem to be accounting for the 0 values correctly. Any value that was a 0 gets bumped to some value less than 1 and my known values fluctuate from the actual quite a bit (All data are ratings between 1 and 10). Can anyone spot what I am doing wrong? From the documentation for scikit, I assumed using the nndsvd initialization would help account for the empty values correct. Sample output:
#Row / Column / New Value
35 18 6.50746917334 #Actual Value is 6
35 19 0.580996641675 #Here down are all "estimates" of my function
35 20 1.26498699492
35 21 0.00194119935464
35 22 0.559623469753
35 23 0.109736902936
35 24 0.181657421405
35 25 0.0137801897011
35 26 0.251979684515
35 27 0.613055371646
35 28 6.17494590041 #Actual values is 5.5
Appreciate any advice any more experienced ML coders can offer!