1

I have a DataFrame reads from MovieLens dataset,it has the format like this:

   user_id  item_id  rating  timestamp
0      196      242       3  881250949
1      186      302       3  891717742
2       22      377       1  878887116
3      244       51       2  880606923
4      166      346       1  886397596

I would like to convert it to numpy.narray,here is working code:

MyCF.train_data_matrix = numpy.zeros((n_users, n_items))
for line in MyCF.train_data.itertuples():
    MyCF.train_data_matrix[line[1] - 1, line[2] - 1] = line[3]

but it is too slow when my DataFrame data is very big ,is there a efficient function in pandas to convert my pandas.DataFrame to numpy.array, the format of my numpy.array should like this:

matrix[user_id][item_id]=rating
cs95
  • 379,657
  • 97
  • 704
  • 746
Hailin FU
  • 492
  • 4
  • 14
  • [`as_matrix`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html) – PidgeyUsedGust Mar 24 '17 at 10:07
  • do you rather want a dictionary according to the line `matrix[user_id][item_id]=rating` ? why not keeping the dataframe structure? – Colonel Beauvel Mar 24 '17 at 10:08
  • I need matrix to calculate user similarity,with matrix structure and numpy,sklearn,a lot of works can be done quickly,thanks for your reply – Hailin FU Mar 24 '17 at 13:33

0 Answers0