0

I have a big survey data (over 50K rows) as:

df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
                            ['a', 'b', 'a', 'b', 'a', 'b'],
                           ['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
                           ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan'],
                           [1,2,3,4,5,6])),
                    columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1

enter image description here

Two compounds (compound) are included for some samples (sampleID). I want to combine the two rows (with two compounds for the same sampleID) to one row:

df2 = pd.DataFrame(list(zip(['0001', '0002', '0003', '0004'],
                            ['a', 'a', '', 'a'],
                            [1, 3, np.nan, 5],
                            ['b', '', 'b', 'b'],
                            [2, np.nan, 4, 6],
                            ['USA', 'USA', 'USA', 'USA'],
                            ['Jan', 'Jan', 'Jan', 'Jan'])),
                    columns=['sample ID', 'compound1', 'value1', 'compound2', 'value2','country', 'month'])
df2

enter image description here

The below can work:

pd.merge((df1.loc[df1.compound == 'a']),
         (df1.loc[df1.compound == 'b']),
         how="outer",
         on=['sample ID', 'country', 'month'],
        suffixes=("_no3", "_no2"))

Any better approach?

Joe
  • 163
  • 8

0 Answers0