I am trying to open this dataset: https://www.kaggle.com/dalpozz/creditcardfraud
Using Ipython notebook. I tried:
data = pd.read_csv("...Desktop/creditcard.csv")
And got:
CParserError: Error tokenizing data. C error: out of memory.
Then I tried the solution pointed by Noobie here: Error tokenizing data. C error: out of memory pandas python, large file csv
And now it can load the data. However, now my data looks like a matrix:
entry 0,0: blank;
entry 0,1: All the headers are here;
entry 1,0: 0
entry 1,1: A whole line of unseparated data here
entry 2,0: 1
entry 2,1: A whole line of unseparated data here
...
What can I do to properly format the data?
My implementation:
mylist = []
for chunk in pd.read_csv('.../Desktop/creditcard.csv', sep=',', chunksize=2000):
mylist.append(chunk)
data = pd.concat(mylist, axis= 0)
del mylist
Few lines of data:
1st line: Time,"V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
2nd line:
0,-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62,"0"