How do I combine non-numeric data from multiple rows in a dataframe?

Question

I have a dataframe that contains multiple entries for single conditions. I want to combine the multiple entries into one row so that each condition is described in one row.

The input data looks like this:

df = pd.DataFrame({"well" : ['E05', 'E05', 'E06', 'E06', 'F05', 'F05', 'F06', 'F06'],
                   "compound" : ['ABC', 'DMSO', 'DEF', 'XYZ', 'GHI', 'DMSO', 'JKL', 'XYZ'],
                   "number" : [34, 23, 16, 92, 22, 56, 44, 23]})

Which when printed is this:

    well    condition   number
0   E05     ABC         34
1   E05     UVW         23
2   E06     DEF         16
3   E06     XYZ         92
4   F05     GHI         22
5   F05     UVW         56
6   F06     JKL         44
7   F06     XYZ         23

I would like to generate a dataframe as below:

   well condition_1 number_1    condition_2 number_2
0   E05 ABC         34          UVW         23
1   E06 DEF         16          XYZ         92
2   F05 GHI         22          UVW         56
3   F06 JKL         44          XYZ         23

I've tried using df.groupby(by="well"), but have not figured out a way to generate a dataframe from the resulting groupby object. Functions such as count() or describe() don't work because then I get summary statistics for number (and the compound gets ignored entirely), when I want each reported separately.

My next guess is to create a for loop to cycle through every unique value in the well series and pull the corresponding entries, but I'm wondering if there's a more efficient/pythonic way to do this.

Thanks for any help.

One among many options: `df.assign(n=lambda d: d.groupby('well').cumcount().add(1)).pivot(index='well', columns='n').pipe(lambda d: d.set_axis(d.columns.map(lambda x: f'{x[0]}_{x[1]}'), axis=1)).reset_index()` — mozway, Apr 17 '23 at 20:35

How do I combine non-numeric data from multiple rows in a dataframe?

0 Answers0