1

I have this dataframe:

d = {'city':['Barcelona','Madrid','Rome','Torino','London','Liverpool','Manchester','Paris'],
'country': ['ES','ES','IT','IT','UK','UK','UK','FR'],
'revenue': [1,2,3,4,5,6,7,8],
'amount': [8,7,6,5,4,3,2,1]
df = pd.DataFrame(d)

I want to obtain this for each country:

españa = {'city':['Barcelona','Madrid']
          'revenue':[1,2]
          'amount':[8,7]}
 ES = pd.DataFrame(españa)

So that in the end I will have 4 dataframes named ES,IT,UK and FR.

I have tried this so far:

a = set(df.loc[:]["country"])
for country in a:
    country = df.loc[(df["country"]== country),['date','sum']]

But that only gave me one dataframe with one value.

user3483203
  • 50,081
  • 9
  • 65
  • 94
Javier Lopez Tomas
  • 2,072
  • 3
  • 19
  • 41

3 Answers3

3

You can use a dictionary comprehension with groupby:

res = {k: v.drop('country', 1) for k, v in df.groupby('country')}

print(res)

{'ES':    amount       city  revenue
       0       8  Barcelona        1
       1       7     Madrid        2,
 'FR':    amount   city  revenue
       7       1  Paris        8,
 'IT':    amount    city  revenue
       2       6    Rome        3
       3       5  Torino        4,
 'UK':    amount        city  revenue
       4       4      London        5
       5       3   Liverpool        6
       6       2  Manchester        7}
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Country is an iterator variable that is being over written.

In order to generate 4 different dataframes, try using a generator function.

def country_df_generator(data): for country in data['country']unique(): yield df.loc[(df["country"]== country), ['date','sum']] countries = country_df_generator(data)

Quentin
  • 700
  • 4
  • 10
  • I have tried your solution and it does not work. I can run the code but no dataframe (or variable) at all is obtained. If I print countries i get . Countrie's type is generator. – Javier Lopez Tomas Jul 17 '18 at 09:08
  • Yes, it returns a generator object. If you iterate over the generator it will generate the desired objects `countries = list(country_df_generator(data))` will give you a tangible list since that what you prefer. – Quentin Jul 17 '18 at 17:09
1

The loop gave you all four data frames, but you threw the first three into the garbage.

You iterate through a with the variable country, but then destroy that value in the next statement, country = .... Then you return to the top of the loop, reset country to the next two-letter abbreviation, and continue this conflict through all four nations.

If you need four data frames, you need to keep each one in a separate place. For instance:

a = set(df.loc[:]["country"])
df_dict = {}

for country in a:
    df_dict[country] = df.loc[(df["country"]== country),['date','sum']]

Now you have a dictionary with four data frames, each one indexed by its country code. Does that help?

Prune
  • 76,765
  • 14
  • 60
  • 81