I'm using Movie Lens Dataset in Python Pandas. I need to print the matrix of u.data
a tab separated file in foll. way
NULL MovieID1 MovieID2 MovieID3
UserID1 Rating Rating Rating
UserID2 Rating Rating Rating
I've already been through following links
- One - Dataset is much huge put it in series
- Two - Transpose of Row not mentioned
- Three - Tried with reindex so as to get NaN values in one column
- Four -
df.iloc
anddf.ix
didn't work either
I need the output so as it shows me rating and NaN (when not rated) for movies w.r.t. users.
NULL MovieID1 MovieID2 MovieID3
UserID1 Rating Rating NaN
UserID2 Rating NaN Rating
P.S. I won't mind having solutions with numpy, crab, recsys, csv or any other python package
EDIT 1 - Sorted the data and exported but got an additional field
df2 = df.sort_values(['UserID','MovieID'])
print type(df2)
df2.to_csv("sorted.csv")
print df2
The file produces foll. sorted.csv file
,UserID,MovieID,Rating,TimeStamp
32236,1,1,5,874965758
23171,1,2,3,876893171
83307,1,3,4,878542960
62631,1,4,3,876893119
47638,1,5,3,889751712
5533,1,6,5,887431973
70539,1,7,4,875071561
31650,1,8,1,875072484
20175,1,9,5,878543541
13542,1,10,3,875693118
EDIT 2 - As asked in Comments
Here's the format of Data in u.data file which acts as input
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817