I have a Pandas dataframe:
test=pd.DataFrame(columns=['GroupID','Sample','SampleMeta','Value'])
test.loc[0,:]='1','S1','S1_meta',1
test.loc[1,:]='1','S1','S1_meta',1
test.loc[2,:]='2','S2','S2_meta',1
I want to (1) group by two columns ('GroupID' and 'Sample'), (2) sum 'Value' per group, and (3) retain only unique values in 'SampleMeta' per group. The desired result ('GroupID' and 'Sample' as index) is shown:
SampleMeta Value
GroupID Sample
1 S1 S1_meta 2
2 S2 S2_meta 1
df.groupby() and the .sum() method get close, but .sum() concatenates identical values in the 'Values' column within a group. As a result, the 'S1_meta' value is duplicated.
g=test.groupby(['GroupID','Sample'])
print g.sum()
SampleMeta Value
GroupID Sample
1 S1 S1_metaS1_meta 2
2 S2 S2_meta 1
Is there a way to achieve the desired result using groupby() and associated methods? Merging the summed 'Value' per group with a separate 'SampleMeta' DataFrame works but there must be a more elegant solution.