1

Data downloaded from: https://www.kaggle.com/c/titanic/data

In order to ensure the code is reproducible, I am trying to do the following but it gives me a parsing error.

train = pd.read_csv("https://www.kaggle.com/c/titanic/download/GQf0y8ebHO0C4JXscPPp%2Fversions%2FXkNkvXwqPPVG0Qt3MtQT%2Ffiles%2Ftrain.csv")

getting this error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2

Here are the first 3 lines of the file:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C

I think it is because the "Name" column has commas separating the first and last name I tired adding " as a separator in the pd.read_csv but it didn't work. Any suggestions? thank you

DStauffman
  • 3,960
  • 2
  • 21
  • 30
Sandra S
  • 35
  • 5
  • The first line of code should suffice; `train` will already be a pandas data frame. I'm slightly confused as to what you are trying to achieve with the second line of code. – Jake Tae Jun 18 '20 at 21:57
  • I took out the second line of code. When I run the code l have, it gives the error I listed. – Sandra S Jun 18 '20 at 22:09
  • 1
    I'll refer you to [this thread](https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data). The summary is that you can skip "bad" lines by doing `train = pd.read_csv("https://www.kaggle.com/c/titanic/download/GQf0y8ebHO0C4JXscPPp%2Fversions%2FXkNkvXwqPPVG0Qt3MtQT%2Ffiles%2Ftrain.csv", error_bad_lines=False)`. I'd recommend that you examine the file and see which lines are causing problems. – Jake Tae Jun 18 '20 at 22:30
  • Jake Tae, I used the bad lines method and upon analyzing the lines it skips, there doesn't seem to be any problems with them. If I download the file directly to my computer and then read it in, there are no parsing errors. This is a kaggle contest so the data is clean. Do you have any other suggestions? – Sandra S Jun 19 '20 at 12:35

0 Answers0