5

I apologize if the title isn't clear, but I had difficulty phrasing the question. It's probably best if I just show what I would like to do.

Some context: I parsed a document for names and stored each name with the page number where it appears. I need to transform the DataFrame so that there is a single row for each name the page number column combines all the pages where the name appears. I figured that this would require GroupBy, but I'm not entirely sure.

My data currently:

data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]])

pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])

  FIRST_NM LAST_NM PAGE_NUM
0     John   Smith        1
1     John   Smith        7
2     Eric   Adams        9
3     Jane     Doe       14
4     Jane     Doe       16
5     John   Smith       19

Desired dataframe:

  FIRST_NM LAST_NM PAGE_NUM
0     John   Smith   1,7,19
1     Eric   Adams        9
2     Jane     Doe    14,16
blahblahblah
  • 2,299
  • 8
  • 45
  • 60
  • `PAGE_NUM` in desired dataframe is string? – Anand S Kumar Sep 21 '15 at 03:07
  • Does this answer your question? [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby) – Ynjxsjmh May 06 '22 at 15:46

1 Answers1

6

You can do this with groupby and apply:

df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))
Out[23]: 
FIRST_NM  LAST_NM
Eric      Adams           9
Jane      Doe         14,16
John      Smith      1,7,19
dtype: object
Marius
  • 58,213
  • 16
  • 107
  • 105