You have multiple options, but your question is actually 2 separate questions.
- How to get a file (.csv in this case) from a remote location.
- How to load a csv into a "df" which is a pandas data frame.
For #2, you simply import pandas
, and use the df = pandas.read_csv()
function call. See the documentation! If the CSV file was in the current directory, you would do pandas.read_csv('myfile.csv')
The CSV is on a server somewhere. In this case, it happens to be on bitbucket's servers accessed from their website. You can fetch it and save it locally, then access it, or you can fetch it to a temporary location, read it into pandas, and discard it. You could even read the data from the file into python as a string. However, having a lot of options doesn't mean they are all useful. I am just listing them for completeness. Looking at the documentation, pandas already has remote fetching built into the read_csv() function. If the passed in path is a valid URL scheme, where, in pandas,
"Valid URL schemes include http, ftp, s3, gs, and file".
If you want to locally save it, you can use pandas to do so once again, using the .write()
method of a data frame.
FOR BITBUCKET SPECIFICALLY:
You need to make sure to link to the 'raw' file on bitbucket. Get the link to the raw file, and pass that in. The link used to view the file on your web browser is not the direct link to the raw file by default, it's a webpage that offers a view into that file. Get the raw file link, then pass that into pandas.
Code example:
Assume we want (a random csv file I found on bitbucket):
https://bitbucket.org/pedrorijo91/nodejstutorial/src/db4c991864e65c4d72e98a1dc94e33606e3adde9/node_modules/levelmeup/data/horse_js.csv?at=master
What you need is a link to the raw file! clicking on ... and pressing 'open raw' we get:
https://bitbucket.org/pedrorijo91/nodejstutorial/raw/db4c991864e65c4d72e98a1dc94e33606e3adde9/node_modules/levelmeup/data/horse_js.csv
Let's look at this in detail, the link is the same up to the project name:
https://bitbucket.org/pedrorijo91/nodejstutorial/
afterwards, the raw file is under raw/
then it's the same pointer (random but same letters and numbers)
db4c991864e65c4d72e98a1dc94e33606e3adde9/
Finally, it's the same directory structure:
node_modules/levelmeup/data/horse_js.csv
The first link ends with a ?at=master which is parsed by the web server and originates from src/ at the web server. The second link, the actual link to the raw file, starts from raw/ and ends with .csv
import pandas as pd
RAW_Bitbucket_URL = 'https://bitbucket.org/pedrorijo91/nodejstutorial/raw/db4c991864e65c4d72e98a1dc94e33606e3adde9/node_modules/levelmeup/data/horse_js.csv'
df = pd.read_csv(RAW_Bitbucket_URL)
The above code is successful for me.