I have a big survey data (over 50K rows) as:
df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
['a', 'b', 'a', 'b', 'a', 'b'],
['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan'],
[1,2,3,4,5,6])),
columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1
Two compounds (compound
) are included for some samples (sampleID
). I want to combine the two rows (with two compounds for the same sampleID
) to one row:
df2 = pd.DataFrame(list(zip(['0001', '0002', '0003', '0004'],
['a', 'a', '', 'a'],
[1, 3, np.nan, 5],
['b', '', 'b', 'b'],
[2, np.nan, 4, 6],
['USA', 'USA', 'USA', 'USA'],
['Jan', 'Jan', 'Jan', 'Jan'])),
columns=['sample ID', 'compound1', 'value1', 'compound2', 'value2','country', 'month'])
df2
The below can work:
pd.merge((df1.loc[df1.compound == 'a']),
(df1.loc[df1.compound == 'b']),
how="outer",
on=['sample ID', 'country', 'month'],
suffixes=("_no3", "_no2"))
Any better approach?