0

I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp

I tried to read the data but, I found no tables found error. I can see the data is in r.text but somehow pandas can not read that table. How to solve the problem and read the data?

MWE

import pandas as pd
import requests

url = "https://worldpopulationreview.com/countries/by-gdp"

r = requests.get(url)
raw_html = r.text  # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))
dallascow
  • 161
  • 10

1 Answers1

1

Data is embedded via <script id="__NEXT_DATA__" type="application/json"> and rendered by browser only, so you have to adjust your script a bit:

pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)

Example

import pandas as pd
import requests,json
from bs4 import BeautifulSoup

url = "https://worldpopulationreview.com/countries/by-gdp"


df = pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]

Output

continent country pop imfGDP unGDP gdpPerCapita
0 North America United States 338290 2.08938e+13 18624475000000 61762.9
1 Asia China 1.42589e+06 1.48626e+13 11218281029298 10423.4
... ... ... ... ... ... ...
210 Asia Syria 22125.2 0 22163075121 1001.71
211 North America Turks and Caicos Islands 45.703 0 917550492 20076.4
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • Thanks a lot for the answer. However, I can't directly read the page due to `SSLCertVerificationError`. to circumvent that, I have downloaded the whole webpage locally and trying to read the data from that. I have a local HTML: `GDP Ranked by Country 2022.html` along with associated folder. How can I read from the local HTML file? – dallascow Dec 13 '22 at 16:20
  • 1
    Have not run into such error - So this would be predestined for [asking a new question](https://stackoverflow.com/questions/ask) with exact this focus May read: https://stackoverflow.com/questions/10667960/python-requests-throwing-sslerror or use `BeautifulSoup(open('MY FILE PATH AND FILE ','r').read())` – HedgeHog Dec 13 '22 at 16:27
  • `open file` method worked fine. Thanks a lot. If you are from Texas stay safe, weather is very bad here. Have a great day! – dallascow Dec 13 '22 at 16:32