I'm pretty new to Python but I've never had trouble on this particular hurdle before.
I'm trying to load the Boston Bluebikes data using pandas dataframes. Somehow the data looks fine in Excel but uploading it throws off a bunch of errors and weird looking data.
Line of code:
jan = pd.read_csv('https://github.com/xixiant/BlueBikes/blob/master/201901-bluebikes-tripdata10.csv', engine='python', header=0, encoding='utf8', error_bad_lines=False)
Some weird data from jan.head():
0 html lang="en"> 1 head> 2 meta charset="utf-8"> 3 link rel="dns-prefetch" href="https://githu... 4 link rel="dns-prefetch" href="https://avata...
Here's the various data I've used: https://github.com/xixiant/BlueBikes
What I've tried so far: 1) read through documentation on pandas.read_csv and messed with all the parameters that immediately make sense (engine, header, error bad lines, encoding), 2) Saved the csv with UTF8 3) removed all the text in the csv 4) Used sublime text to encode as UTF8 5) copied values into a google sheet and downloaded a copy as a csv
I guess if I were to keep going down this path, I'd see if there were other methods of reading in csvs that don't rely on pandas, but I really feel like I should be able to overcome this using pandas.
These are the links that seemed most promising regarding my particular question: Python Pandas Error tokenizing data https://www.shanelynn.ie/pandas-csv-error-error-tokenizing-data-c-error-eof-inside-string-starting-at-line/
I wouldn't be surprised if I'm doing something completely ridiculous, but man.. really? Am I just that off base? Any advice at all would be super appreciated.