How can I reshape my df and split one categorical column into several?

Question

I have seperate csv data for each weather station, each climate scenario, each RCP and each variable. After appending the data an example of it looks like this:

Example:

import pandas as pd
df_before = pd.DataFrame({'canton': ['AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'BE', 'BE', 'BE'], 
                          'time': ['1981-01-01', '1981-01-01', '1981-01-01', '1981-01-01', '1981-01-01', 
                                   '1981-01-01', '1981-01-01', '1981-01-01', '1981-01-02', '1981-01-02', '1981-01-02', '1981-01-02'],
                           'value': [3, 5, 1, 2, 4, 5, 6, 7, 2, 1, 2, 3],
                          'stn': ['BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BER', 'BER', 'BER'],
                          'RCPs': ['RCP26', 'RCP26', 'RCP26', 'RCP45', 'RCP45', 'RCP45', 'RCP85', 'RCP85', 'RCP85', 'RCP26', 'RCP26', 'RCP26'],
                          'var': ['tas', 'tasmin','tasmax', 'tas', 'tasmin', 'tasmax', 'tas',  'tasmin',  'tasmax', 'tas', 'tasmin', 'tasmax', ]
                         })

df_before

However, I would want to split the value column on to the var (tas, tasmin, tasmax, pr) column, without loosing information from the RCPs and the station:

df_after = pd.DataFrame({'canton': ['AG', 'AG', 'AG', 'BE'], 
                          'time': ['1981-01-01', '1981-01-01', '1981-01-01', '1981-01-02'],
                          'stn': ['BUS', 'BUS', 'BUS', 'BER'],
                          'RCPs': ['RCP26', 'RCP45', 'RCP85', 'RCP26'],
                          'tas': ['3', '2', '6', '1'],
                          'tasmin': ['5', '4', '7', '2'],
                          'tasmax': ['1', '5', '2', '3']
                         })

Such that it looks like this: df_after

Please clarify what your desired output is. What should these additional columns look like? Ideally, please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with a small sample of your input dataframe and desired output from the transformation you try to explain with clarifying details. Also see [how to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391). You can [edit] the question. — AlexK, Jun 11 '22 at 20:05
Please explain the logic further. You are starting with 3 values and somehow ending up with 9 values. How are you getting these new values? — AlexK, Jun 13 '22 at 18:30
Hi, so my data are climate predictions. I have this predictions for different emission scenarios (RCPs), for different cantons/stations and for different weather variables (tas - mean temperature, tasmax, tasmin, and pr (precipitation). I want to reshape my df, such that I have the values for my 4 different weather variables as separate variables. I realize that my example above is not completely representative. I hope with this explanations it's clear, otherwise, when I find the time I will draft a better example ;-) — anbanael, Jun 14 '22 at 12:06
I am sensing you are looking to [pivot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html) your dataframe. But to provide you with a full solution, yes, please make a more clear example showing how input columns and values should map to output columns and values. — AlexK, Jun 14 '22 at 20:25
I think you have a typo in the code creating `df_before`: based on how you want the `df_after` to look, the fourth date from the end in the `time` list should be `1981-01-01`. — AlexK, Jun 20 '22 at 19:45

score 0 · Answer 1 · answered Jun 20 '22 at 19:43

Like I mentioned in the comments, you can use the pivot() method:

df_after = df_before.pivot(
    index=['canton','time','stn','RCPs'], columns='var', values='value'
).rename_axis(None, axis=1).reset_index()

print(df_after)
    canton        time  stn  RCPs   tas tasmax  tasmin
0       AG  1981-01-01  BUS RCP26     3      1       5
1       AG  1981-01-01  BUS RCP45     2      5       4
2       AG  1981-01-01  BUS RCP85     6      2       7
3       BE  1981-01-02  BER RCP26     1      3       2

The rename_axis() method removes the name of the column index after pivoting and reset_index() converts the multi-index created by pivot to dataframe columns.

How can I reshape my df and split one categorical column into several?

1 Answers1