0

I'm pretty new to Python but I've never had trouble on this particular hurdle before.

I'm trying to load the Boston Bluebikes data using pandas dataframes. Somehow the data looks fine in Excel but uploading it throws off a bunch of errors and weird looking data.

Line of code: jan = pd.read_csv('https://github.com/xixiant/BlueBikes/blob/master/201901-bluebikes-tripdata10.csv', engine='python', header=0, encoding='utf8', error_bad_lines=False)

Some weird data from jan.head():

0 html lang="en"> 1 head> 2 meta charset="utf-8"> 3 link rel="dns-prefetch" href="https://githu... 4 link rel="dns-prefetch" href="https://avata...

Here's the various data I've used: https://github.com/xixiant/BlueBikes

What I've tried so far: 1) read through documentation on pandas.read_csv and messed with all the parameters that immediately make sense (engine, header, error bad lines, encoding), 2) Saved the csv with UTF8 3) removed all the text in the csv 4) Used sublime text to encode as UTF8 5) copied values into a google sheet and downloaded a copy as a csv

I guess if I were to keep going down this path, I'd see if there were other methods of reading in csvs that don't rely on pandas, but I really feel like I should be able to overcome this using pandas.

These are the links that seemed most promising regarding my particular question: Python Pandas Error tokenizing data https://www.shanelynn.ie/pandas-csv-error-error-tokenizing-data-c-error-eof-inside-string-starting-at-line/

I wouldn't be surprised if I'm doing something completely ridiculous, but man.. really? Am I just that off base? Any advice at all would be super appreciated.

  • 1
    It's not clear to me where you're uploading this data. I went to their website, downloaded some trip data, loaded it into pandas, and it looks good to me. There's a step I'm missing though. – mechanical_meat Feb 17 '20 at 00:29
  • I've got it here: https://colab.research.google.com/drive/1r_i9BOsXIDTkl-wPC_4guzgPcK4EIaHF Yikes, I wonder if maybe my issue is something about the way I'm sharing it through github. – xixiant Feb 17 '20 at 03:47
  • Well, the link should be to the raw file instead: https://raw.githubusercontent.com/xixiant/BlueBikes/master/201901-bluebikes-tripdata10.csv – mechanical_meat Feb 17 '20 at 03:50
  • 1
    That is SO helpful. Man! I spent a ton of time today troubleshooting the absolute wrong problem. I also found this question that works nicely too: https://stackoverflow.com/questions/48350226/methods-for-using-git-with-google-colab – xixiant Feb 17 '20 at 04:30
  • It's nice to know there are two ways - adding 'raw' is good to know. – xixiant Feb 17 '20 at 04:30
  • Also, it is EXHAUSTING being new to things. It is so tempting to go fly through this exercise in Excel. :) Thank you again for your reply and excellent advice! – xixiant Feb 17 '20 at 04:31
  • You're welcome. Best of luck with the rest of your project! – mechanical_meat Feb 17 '20 at 04:34

0 Answers0