0

I have a dataframe with these values:

filename, keyword, page
A, red, 1
A, red, 2
A, green, 1
B, red, 1
B, green, 1
C, green, 2

How can I transform this to the following format?

filename, keywords, pages
A, [red, green], [1,2]
B, [red, green], [1]
C, [green], [2]

Is there an easy way to do this in Pandas? If a list isn't allowed as a cell value, is there another datatype that I could use that Pandas would allow? Or an alternative to a Pandas dataframe that I could store this in and then save it to a csv?

code_to_joy
  • 569
  • 1
  • 9
  • 27
  • 1
    `groupby(filename).agg(set)`? – Ch3steR Oct 26 '20 at 15:28
  • Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) – Ch3steR Oct 26 '20 at 15:29

1 Answers1

0

you could use df.groupby(["filename"])['keyword','page'].agg(set)

keyword page
filename        
A   {green, red}    {1, 2}
B   {green, red}    {1}
C   {green} {2}

( PS: updated based on Ch3steR answers, i was only using list instead of set

Myrt
  • 34
  • 4