reshape pandas data frame: duplicated rows to columns, with textual data

Question

I have a dataframe like this:

INDEX_COL                col1
A                        Random Text 
B                        Some more random text
C                        more stuff
A                        Blah
B                        Blah, Blah
C                        Yet more stuff
A                        erm
B                        yup
C                        whatever

What I need is it reformed into new columns and stacked/grouped by values in col_1. So something like this:

A                               B                              C
Random Text                     Some more random text          more stuff
Blah                            Blah, Blah                     Yet more stuff
erm                             yup                            whatever

I've reviewed How can I pivot a dataframe? but all of the examples work with numerical data and this is a use case that involves textual data, so aggregation appears to be not an option (but it was - see accepted answer below)

I've tried the following:

Pivot - but all the examples I've seen involve numerical values with aggregate functions. This is reshaping non-numerical data

I get that index=INDEX COL, and columns= 'col1', but values? add a numerical column, pivot and then drop the numberical columns created? Feels like trying for forced pivot to do something it was never meant to do.

Unstack - but this seems to convert the df into a new df with a single value index of 'b'

unstack(level=0)

I've even considered slicing the data frame by index into separate dataframes and the concatinating them, but the mismatched indexes result in NaN appearing like a checkerboard. Also this feels like an fugly solution.

I've tried dropping the index_col, with Col1=['A,B,C'] and col2= the random text, but the new integer index comes along and spoils the fun.

Any suggestions or thoughts in which direction I should go with this?

score 2 · Accepted Answer · answered Nov 24 '22 at 19:25

2

You can use agg(list) and then explode the whole dataframe:

output =  df.groupby('INDEX_COL').agg(list).T.apply(pd.Series.explode)

output:

INDEX_COL   A          B                        C
col1    Random Text   Some more random text   more stuff
col1    Blah          Blah, Blah         Yet more stuff
col1    erm              yup               whatever

answered Nov 24 '22 at 19:25

Nuri Taş

3,828
2
4
22

1

Thanks for this. I've chosen this solution as it's the one that is the easiest to read and follow, with chained functions setting out the sequence of actions clearly. What I was missing in my earlier attempts was not knowing that a textual list can be aggregated and that the resulting datatype was Series. – Greg Williams Nov 25 '22 at 08:45

PaulS · Answer 2 · 2022-11-24T19:56:58.723

Another possible solution, using pandas.pivot_table:

(df.pivot_table(columns='INDEX_COL', values='col1', aggfunc=list)
 .pipe(lambda d: d.explode(d.columns.tolist()))
 .reset_index(drop=True))

Output:

INDEX_COL            A                      B               C
0          Random Text  Some more random text      more stuff
1                 Blah             Blah, Blah  Yet more stuff
2                  erm                    yup        whatever

reshape pandas data frame: duplicated rows to columns, with textual data

2 Answers2