Data downloaded from: https://www.kaggle.com/c/titanic/data
In order to ensure the code is reproducible, I am trying to do the following but it gives me a parsing error.
train = pd.read_csv("https://www.kaggle.com/c/titanic/download/GQf0y8ebHO0C4JXscPPp%2Fversions%2FXkNkvXwqPPVG0Qt3MtQT%2Ffiles%2Ftrain.csv")
getting this error
ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
Here are the first 3 lines of the file:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
I think it is because the "Name"
column has commas separating the first and last name
I tired adding " as a separator in the pd.read_csv
but it didn't work.
Any suggestions?
thank you