How to combine multiple rows (with two column values are different) into one row in Pandas

Question

I have a big survey data (over 50K rows) as:

df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
                            ['a', 'b', 'a', 'b', 'a', 'b'],
                           ['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
                           ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan'],
                           [1,2,3,4,5,6])),
                    columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1

Two compounds (compound) are included for some samples (sampleID). I want to combine the two rows (with two compounds for the same sampleID) to one row:

df2 = pd.DataFrame(list(zip(['0001', '0002', '0003', '0004'],
                            ['a', 'a', '', 'a'],
                            [1, 3, np.nan, 5],
                            ['b', '', 'b', 'b'],
                            [2, np.nan, 4, 6],
                            ['USA', 'USA', 'USA', 'USA'],
                            ['Jan', 'Jan', 'Jan', 'Jan'])),
                    columns=['sample ID', 'compound1', 'value1', 'compound2', 'value2','country', 'month'])
df2

The below can work:

pd.merge((df1.loc[df1.compound == 'a']),
         (df1.loc[df1.compound == 'b']),
         how="outer",
         on=['sample ID', 'country', 'month'],
        suffixes=("_no3", "_no2"))

Any better approach?

Q/A 10 in the dup link: pivot with columns being the enumration of `sample ID`. — Quang Hoang, May 05 '23 at 03:08
Thanks, they are different, aren't they? I can slice and then merge instead. But I am wondering about better approaches. — Joe, May 05 '23 at 05:08
Ah, I see how they're different. Actually you can just pivot with compound as column . — Quang Hoang, May 05 '23 at 15:18

How to combine multiple rows (with two column values are different) into one row in Pandas

0 Answers0