2

Here is my code:

data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max)

df=pd.DataFrame(data)

replace=df.replace(0,np.NaN)

t=replace.fillna(-1)

sparse=sp.csr_matrix(t.values)

My data consist of two columns which are movie_id and user_id.

 user_id      movie_id

   5             1000 

   6             1007 

I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1. Then I converted it to a sparse matrix using scipy. My result looks like this:

(0,0) -1

(0,1) -1

(0,2) 1

but what actually i want is this:

(1000,0) -1

(1000,1) 1

(1007,0) -1

Any help would be appreciated.

Akhil Alexander
  • 81
  • 2
  • 14

1 Answers1

5

If you have both the row and column index (in your case movie_id and user_id, respectively), it is advisable to use the COO format for creation.

You can convert it into a sparse format like so:

import scipy
sparse_mat = scipy.sparse.coo_matrix((t.values, (df.movie_id, df.user_id)))

Importantly, note how the constructor gives the implicit shape of the sparse matrix by passing both the movie ID and user ID as arguments for the data.
Furthermore, you can convert this matrix to any other sparse format you desire, as for example CSR.

dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • unfortunately while i building the interaction matrix dataframe attributes are avoided ,so the error appears-->'DataFrame' object has no attribute 'movie_id',how can i create the interaction matrix with th e attribute??? – Akhil Alexander Jul 09 '18 at 09:33
  • Well, this is only in accordance with what you provided as the DF structure. basically, you simply pass the column of your dataframe that contains the `movie_id` arguments. – dennlinger Jul 09 '18 at 09:41
  • data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max) doing the above it appears to have that my interaction matrix does not contain the two arguments ,while i aded the arguments from the first dataframe(data) error pops up like this " row, column, and data array must all be the same length" – Akhil Alexander Jul 09 '18 at 11:07
  • I can't really tell what's going wrong here without seeing the data. I also believe that this is not related to the question at all, so I would suggest that you may want to start another thread with the separate problem, or provide a little more insight about your data. Even a minimal reproducible example should be enough, but as we have no idea what your data looks like, we cannot help... – dennlinger Jul 09 '18 at 11:21
  • its just a dummy data to test the LightFM recommender,is it okay to send my data and my code to your mail so that you can have a look – Akhil Alexander Jul 09 '18 at 11:26
  • I created a chatroom for this question [here](https://chat.stackoverflow.com/rooms/info/174650/discussion-on-converting-pandas-dataframe-to-sparse-matrix) to avoid cluttering in the comments. – dennlinger Jul 09 '18 at 11:45