4

Just looking for some brief advice to put me back on the right track. I have been working on a solution to a problem where I have a very sparse input matrix (~25% of information filled, rest is 0's) stored in a sparse.coo_matrix:

sparse_matrix = sparse.coo_matrix((value, (rater, blurb))).toarray()

After some work on building this array from my data set and messing around with some other options, I currently have my NMF model fitter function defined as follows:

def nmf_model(matrix): 
  model = NMF(init='nndsvd', random_state=0)

  W = model.fit_transform(matrix);
  H = model.components_;
  result = np.dot(W,H)

  return result

Now, the issue is my output doesn't seem to be accounting for the 0 values correctly. Any value that was a 0 gets bumped to some value less than 1 and my known values fluctuate from the actual quite a bit (All data are ratings between 1 and 10). Can anyone spot what I am doing wrong? From the documentation for scikit, I assumed using the nndsvd initialization would help account for the empty values correct. Sample output:

#Row / Column / New Value
35 18 6.50746917334 #Actual Value is 6
35 19 0.580996641675 #Here down are all "estimates" of my function
35 20 1.26498699492
35 21 0.00194119935464
35 22 0.559623469753
35 23 0.109736902936
35 24 0.181657421405
35 25 0.0137801897011
35 26 0.251979684515
35 27 0.613055371646
35 28 6.17494590041 #Actual values is 5.5

Appreciate any advice any more experienced ML coders can offer!

brainiac
  • 41
  • 4
  • I believe you are confusing matrix factorization with matrix completion. – David Maust Jan 17 '16 at 05:22
  • I was fairly certain from what I was looking at that you could do completion with sklearn NMF. Am I misinterpreting something? What would you recommend instead? – brainiac Jan 17 '16 at 17:04
  • I've never seen scikit-learn's NMF used this way. I have used vowpal wabbit's low rank quadratic feature for a similar problem. Also, found this post that might be helpful http://stackoverflow.com/questions/22767695/python-non-negative-matrix-factorization-that-handles-both-zeros-and-missing-dat – David Maust Jan 17 '16 at 18:55
  • Thanks for the suggestions @DavidMaust. Looks like that link might be somewhat helpful, I basically just want to use mine in a way that it ignores the 0's in a row and just average across the rest of that row and column to determine a probable rating that particular element will receive. I'll keep looking into it anyway. – brainiac Jan 17 '16 at 22:50

0 Answers0