0

Can I create multiple dataframes in a loop?

I have a long list of webscraped information but I want to turn them into multiple dataframes. Not sure if this is possible....

Below is my original webscraped code:

indicator =  {'SI.POV.GINI?date=2000:2020','SL.UEM.TOTL.ZS?date=2000:2020','NE.IMP.GNFS.ZS?date=2000:2020','NE.EXP.GNFS.ZS?date=2000:2020'}

url_list = []
for i in indicator:
    url = "http://api.worldbank.org/v2/countries/all/indicators/%s&format=json&per_page=5000" % i
    url_list.append(url)

result_list = []
for i in url_list:
    response = requests.get(i)
    print(response)
    result_list.append(response.content)

result_json = []
for i in range(len(result_list)):
    result_json.append(json.loads(result_list[i]))

result_json

If not, I've also opted to do it manually but i'm getting an error

gini_df = pd.DataFrame.from_dict(result_json[0])
gini_df

AttributeError: 'list' object has no attribute 'keys'

Yel
  • 65
  • 6

1 Answers1

2

Creating multiple dataframes in a loop is straightforward You can append your dataframes to a list or store them in a dictionary under specific keys. Here's some example code:

import numpy as np
import pandas as pd

df_list = []
for i in range(10):
    df = pd.DataFrame(np.random.rand(3,3), columns=['a', 'b', 'c'])
    df_list.append(df)

print(df_list[5])

a         b         c
0  0.361910  0.521254  0.763633
1  0.030419  0.098978  0.929679
2  0.304616  0.563361  0.326490

For your task, the issue is parsing and flattening a hierarchical data structure into a format pandas can understand. For example:

indicator = {'SI.POV.GINI?date=2000:2020'}
url_list = []
for i in indicator:
    url = "http://api.worldbank.org/v2/countries/all/indicators/%s&format=json&per_page=5000" % i
    url_list.append(url)

result_list = []
for i in url_list:
    response = requests.get(i)
    print(response)
    result_list.append(response.content)

result_json = []
for i in range(len(result_list)):
    result_json.append(json.loads(result_list[i]))

columns = ['indicator_id', 'indicator_value','country_id', 'country_value','countryiso3code', 'date', 'value', 'unit', 'obs_status', 'decimal']
data = {}
for i in columns:
    data[i] = []

for i in result_json:
    for record in i[1]:
        for k in columns:
            try:
                key = k.split('_')
                val = record[key[0]]
                if type(val) == dict:
                    data[k].append(val[key[1]])
                else:
                    data[k].append(val)
            except:
                data[k].append('')
df = pandas.DataFrame(data)
print(df)

Note that my example code only runs on one of your indicators. If you wanted to loop each indicator, you would append the final df to a list as I did in the upper example.

Matt L.
  • 3,431
  • 1
  • 15
  • 28
  • thanks! I've just edited my question. will your comment still apply? – Yel Jun 12 '20 at 14:04
  • Your error is unrelated to your question. You are getting an error because the json data is not in a format the to_dict method can understand. – Matt L. Jun 12 '20 at 14:11
  • yeah, actually i'm also shocked with the error because when I ran the model for each indicator, it worked fine. the json output was converted to a DataFrame even if the format is a list. – Yel Jun 12 '20 at 14:14
  • i tried the gini_df = pd.DataFrame(result_json[0]) but the result was **'list' object has no attribute 'keys'** – Yel Jun 12 '20 at 14:16