28

Im trying to read CSV file thats on github with Python using pandas> i have looked all over the web, and I tried some solution that I found on this website, but they do not work. What am I doing wrong?

I have tried this:

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)

print(df.head(5))
taga
  • 3,537
  • 13
  • 53
  • 119
  • 7
    set url to the 'raw' view `https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv` – Chris Adams Mar 19 '19 at 11:50
  • 1
    Take a look at https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url - that might help you out! – Tjofoed Mar 19 '19 at 11:54
  • 1
    @ChrisA This is good, thanks ! Can you tell me how did you get `raw` view? I see that your link does not have `github.com` and `blob` – taga Mar 19 '19 at 11:58
  • 5
    Yeh, if you go to your original link. Above the main window, to the right there are 3 buttons `raw, blame, history`. Click raw – Chris Adams Mar 19 '19 at 12:04
  • Thanks my friend – taga Mar 19 '19 at 12:05
  • @taga another way to get to the raw view is by adding "?raw=True" at the end of the webadress of the file. (Same goes for images on GitHub) – DannyVanpoucke Feb 02 '23 at 12:41

5 Answers5

40

You should provide URL to raw content. Try using this:

import pandas as pd

url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))

Output:

               alpha-2           ...            intermediate-region-code
name                             ...                                    
Afghanistan         AF           ...                                 NaN
Åland Islands       AX           ...                                 NaN
Albania             AL           ...                                 NaN
Algeria             DZ           ...                                 NaN
American Samoa      AS           ...                                 NaN
Alderven
  • 7,569
  • 5
  • 26
  • 38
  • 1
    Or, you can simply add "?raw=true" at the end of the GitHub URL. You can see my answer below to view how the code looks. – Krishnakanth Allika Aug 08 '20 at 07:43
  • Doing this using a gitlab public repo, I got an HTTPerror (`HTTP Error 403: Forbidden`). Is there a way to do the same with raw links on gitlab ? – Ger Oct 04 '20 at 10:04
26

Add ?raw=true at the end of the GitHub URL to get the raw file link.

In your case,

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
print(df.head(5))

Output:

               alpha-2 alpha-3  country-code     iso_3166-2   region  \
name                                                                   
Afghanistan         AF     AFG             4  ISO 3166-2:AF     Asia   
Åland Islands       AX     ALA           248  ISO 3166-2:AX   Europe   
Albania             AL     ALB             8  ISO 3166-2:AL   Europe   
Algeria             DZ     DZA            12  ISO 3166-2:DZ   Africa   
American Samoa      AS     ASM            16  ISO 3166-2:AS  Oceania   

                     sub-region intermediate-region  region-code  \
name                                                               
Afghanistan       Southern Asia                 NaN        142.0   
Åland Islands   Northern Europe                 NaN        150.0   
Albania         Southern Europe                 NaN        150.0   
Algeria         Northern Africa                 NaN          2.0   
American Samoa        Polynesia                 NaN          9.0   

                sub-region-code  intermediate-region-code  
name                                                       
Afghanistan                34.0                       NaN  
Åland Islands             154.0                       NaN  
Albania                    39.0                       NaN  
Algeria                    15.0                       NaN  
American Samoa             61.0                       NaN 

Note: This works only with GitHub links and not with GitLab or Bitbucket links.

  • May I ask why when I read the file and print it, it only shows lines 0-20 then skips to 3000 and goes till the end. Note: there are 5000 lines. –  Aug 02 '20 at 17:52
  • @AirStalk3r You need to provide more information. May be post a new question with details. – Krishnakanth Allika Aug 08 '20 at 07:45
6

You can copy/paste the url and change 2 things:

  1. Remove "blob"
  2. Replace github.com by raw.githubusercontent.com

For instance this link:

https://github.com/mwaskom/seaborn-data/blob/master/iris.csv

Works this way:

import pandas as pd

pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
0

I recommend to either use pandas as you tried to and others here have explained, or depending on the application, the python csv-handler CommaSeperatedPython, which is a minimalistic wrapper for the native csv-library.

The library returns the contents of a file as a 2-Dimensional String-Array. It's is in its very early stage though, so if you want to do large scale data-analysis, I would suggest Pandas.

PixelRayn
  • 392
  • 2
  • 13
0

First convert the github csv file to raw in order to access the data, follow the link below in comment on how to convert csv file to raw .

import pandas as pd

url_data = (r'https://raw.githubusercontent.com/oderofrancis/rona/main/Countries-Continents.csv')

data_csv = pd.read_csv(url_data)

data_csv.head()
  • how to convert csv file to raw in github https://projectosyo.wixsite.com/datadoubleconfirm/single-post/2019/04/15/reading-csv-data-from-github-python – Francis Odero Apr 26 '21 at 18:41