Operations on multiple data frame in PANDAS

Question

I have several tables that look like this:

code:

raw = {'ID': [2, 2, 4, 4,  6, 6,],
   'YY': [97,78,47,110,67,88],
   'ZZ':[826,489,751,322,554,714]}
df = pd.DataFrame(raw)

For each of these dfs, I have to perform a number of operations.

First, group by id,
extract the length of the column zz and average of the column zz,
put results in new df

New df that looks like this

Cities  length  mean
Paris     0      0
Madrid    0      0
Berlin    0      0
Warsaw    0      0
London    0      0

code:

raw2 = {'Cities': ['Paris', 'Madrid', 'Berlin',  'Warsaw', 'London'],
   'length': 0,
   'mean': 0}
df2 = pd.DataFrame(raw2)

I pulled out the average and the size of individual groups

df_grouped = df.groupby('ID').ZZ.size()
df_grouped2 = df.groupby('ID').ZZ.mean()

the problem occurs when trying to transfer results to a new table because it does not contain all the cities and the results must be matched according to the appropriate key.

I tried to use a dictionary:

   dic_cities = {"Paris":df_grouped.loc[2],
             "Madrid":df_grouped.loc[4],
             "Warsaw":df_grouped.loc[6],
             "Berlin":df_grouped.loc[8],
             "London":df_grouped.loc[10]}

Unfortunately, I'm receiving KeyError: 8

I have 19 df's from which I have to extract this data and the final tables have to look like this:

    Cities  length  mean
    Paris   2       657.5
    Madrid  2       536.5
    Berlin  0       0.0
    Warsaw  2       634.0
    London  0       0.0

Does anyone know how to deal with it using groupby and the dictionary or knows a better way to do it?

You can do a simple left merge onto your key, have a look at pd.merge — Umar.H, Jun 03 '19 at 12:16

Zaraki Kenpachi · Answer 1 · 2019-06-03T13:26:01.063

See this:

import pandas as pd

# setup raw data
raw = {'ID': [2, 2, 4, 4,  6, 6,], 'YY': [97,78,47,110,67,88], 'ZZ':[826,489,751,322,554,714]}
df = pd.DataFrame(raw)

# get mean values
mean_values = df.groupby('ID').mean()
# drop column
mean_values = mean_values.drop(['YY'], axis=1)
# get occurrence number
occurrence = df.groupby('ID').size()
# save data
result = pd.concat([occurrence, mean_values], axis=1, sort=False)
# rename columns
result.rename(columns={0:'length', 'ZZ':'mean'}, inplace=True)

# city data
raw2 = 'Cities': ['Paris', 'Madrid', 'Berlin',  'Warsaw', 'London'], 'length': 0, 'mean': 0}
df2 = pd.DataFrame(raw2)

# rename indexes
df2 = df2.rename(index={0: 2, 1:4, 2:8, 3:6, 4:10}

# merge data
df2['length'] = result['length']
df2['mean'] = result['mean']

Outout:

    Cities  length   mean
2    Paris     2.0  657.5
4   Madrid     2.0  536.5
8   Berlin     NaN    NaN
6   Warsaw     2.0  634.0
10  London     NaN    NaN

score 1 · Accepted Answer · answered Jun 03 '19 at 13:01

1

First, you should index df2 on 'Cities':

raw2 = {'Cities': ['Paris', 'Madrid', 'Berlin',  'Warsaw', 'London'],
   'length': 0,
   'mean': 0}
df2 = pd.DataFrame(raw2).set_index('Cities')

Then you should reverse you dictionary:

dic_cities = {2: "Paris",
             4: "Madrid",
             6: "Warsaw",
             8: "Berlin",
             10: "London"}

Once this is done, the processing is as simple as a groupby:

for i, sub in df.groupby('ID'):
    df2.loc[dic_cities[i]] = sub.ZZ.agg([len, np.mean]).tolist()

Which gives for df2:

        length   mean
Cities               
Paris      2.0  657.5
Madrid     2.0  536.5
Berlin     0.0    0.0
Warsaw     2.0  634.0
London     0.0    0.0

answered Jun 03 '19 at 13:01

Serge Ballesta

143,923
11
122
252

Thank you very much, it works perfectly. But I have one more question, how I can round-up the mean() values to 2 decimal places and to intiger ? – Kepasere Jun 04 '19 at 07:46
1

`astype` is the way to change a column to integer: `df2['length'] = df2['length].astype(int)`. But rounding a floating value to 2 decimal place is hopeless, just read [Is math floating point broken?](https://stackoverflow.com/q/588004/3545273) to understand why. It is just a display question and depends on how you later process the dataframe. – Serge Ballesta Jun 04 '19 at 07:53

Operations on multiple data frame in PANDAS

2 Answers2