0

I have the input file which is something like:

Input file:

I want to remove the duplicates from request_id field but need the corresponding e_id for it as comma separated values.

For Ex:

My Output should look like:

Required Output

I tried to bring this output using dataframe pd.DataFrame.drop_duplicates(subset=['Request_id'], keep='last')
But it didn't work as expected. Any suggestions to bring the desired output would be appreciated. Thanks in advance :)

swathi
  • 11
  • 5
  • related: http://stackoverflow.com/questions/33699137/groupby-separate-sum-with-commas basically do `df.groupby('Request_id')['E_id'].apply(','.join)` – EdChum Mar 27 '17 at 10:05
  • @EdChum I tried with groupby but am getting error stating `unbound method groupby() must be called with DataFrame instance as first argument (got Series instance instead)` . Also tried converting Series.to_frame but no luck. – swathi Mar 27 '17 at 10:39
  • Your question is unclear, is 'Request_id' the index or is the entire thing a df? The normal form is to post raw data, code to recreate your df, your attempts and the desired output – EdChum Mar 27 '17 at 10:46
  • Actually it's my mistake I have kept 'Request_id' in a list and tried using dataframe so it expected Series instance. After I changed it to DataFrame , it worked. Thanks @EdChum – swathi Mar 27 '17 at 12:20

0 Answers0