Bad format of DataFrame using pandas.read_csv

Question

I am trying to open this dataset: https://www.kaggle.com/dalpozz/creditcardfraud

Using Ipython notebook. I tried:

data = pd.read_csv("...Desktop/creditcard.csv")

And got:

CParserError: Error tokenizing data. C error: out of memory.

Then I tried the solution pointed by Noobie here: Error tokenizing data. C error: out of memory pandas python, large file csv

And now it can load the data. However, now my data looks like a matrix:

entry 0,0: blank;
entry 0,1: All the headers are here;
entry 1,0: 0
entry 1,1: A whole line of unseparated data here
entry 2,0: 1
entry 2,1: A whole line of unseparated data here
...

What can I do to properly format the data?

My implementation:

mylist = []

for chunk in  pd.read_csv('.../Desktop/creditcard.csv', sep=',', chunksize=2000):
    mylist.append(chunk)

data = pd.concat(mylist, axis= 0)
del mylist

Few lines of data:
1st line: Time,"V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
2nd line:
0,-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62,"0"

What's the separator for the csv? If I understand the sample you provided, it's not splitting the data correctly. Specify `sep` in `pd.read_csv`. — 3novak, Mar 17 '17 at 12:42
Please edit your post with a snippet of your data as kaggle requires an account for csv file and post actual solution tried as we need to see implementation. — Parfait, Mar 17 '17 at 12:50
Hi, added my implementation. Can't seem to understand how to upload an image here... — Ricardo Alberto, Mar 17 '17 at 13:01
No image needed. Just copy and paste the first few lines of the data — languitar, Mar 17 '17 at 13:23
I tried this: http://stackoverflow.com/questions/31682798/python-pandas-read-csv-split-column-into-multiple-new-columns-using-comma-to-sep. But still not working :( — Ricardo Alberto, Mar 17 '17 at 16:26

Bad format of DataFrame using pandas.read_csv

0 Answers0