0

I have a dataset containing the indicators of different cities: so the same indicator is repeated several times for different cities. The dataset is something like this:

df
    city    indicator                   Value
0   Tokio   Solid Waste Recycled        1.162000e+01
1   Boston  Solid Waste Recycled        3.912000e+01
2   London  Solid Waste Recycled        0.000000e+00
3   Tokio   Own-Source Revenues         1.420000e+00
4   Boston  Own-Source Revenues         0.000000e+00
5   London  Own-Source Revenues         3.247000e+01
6   Tokio   Green Area                  4.303100e+02
7   Boston  Green Area                  7.166350e+01
8   London  Green Area                  1.997610e+01
9   Tokio   City Land Area              9.910000e+01
10  Boston  City Land Area              4.200000e+01
11  London  City Land Area              8.956000e+01

From the original dataframe I would like to create a second dataframe like the following:

 df1
            Solid Waste Recycled  Own-Source Revenues  Green Area    City Land Area
  Tokio     1.162000e+01          1.420000e+00         4.303100e+02  9.910000e+01
  Boston    3.912000e+01          0.000000e+00         7.166350e+01  4.200000e+01
  London    0.000000e+00          3.247000e+01         1.997610e+01 
emax
  • 6,965
  • 19
  • 74
  • 141

1 Answers1

0

Maybe there is a better solution, but you can groupby and then apply a function on each grouped dataframe to create a new one:

grouped = df.groupby('City')
res = defaultdict(list)
for k, k_df in grouped:
    res['City'].append(k)
    k_df.apply(lambda row: res[row['Indicator']].append(row['Value']), axis=1)
pd.DataFrame(res)

Notice this will work only if all the values appear for all cities. If you want to support missing values you add Nones for missing values of each city. This requires to collect all possible values and check they were inserted:

grouped = df.groupby('City')
res = defaultdict(list)
new_columns = set(df['Indicator']) #all possible values
for k, k_df in grouped:
    res['City'].append(k)
    k_df.apply(lambda row: res[row['Indicator']].append(row['Value']), axis=1)
    for col in new_columns:
        if len(res[col]) < len(res['City']): # check if values is missing
            res[col].append(None)
AndreyF
  • 1,798
  • 1
  • 14
  • 25