0

For the following dataframe,

d = {'col1': [33,33,33,34,34,34], 'col2': ["hello","hello1","hello2","hello3","hello4","hello5"]}
df = pd.DataFrame(data=d)
print(d)

I want it to group by col1 and the content in col2 are concatenate as a list, the result is as followed:

import pandas as pd
d = {'col1': [33,34], 'col2': [["hello","hello1","hello2"],["hello3","hello4","hello5"]]}
df = pd.DataFrame(data=d)
print(d)

Is there an easy way to achieve that?

lczapski
  • 4,026
  • 3
  • 16
  • 32
william007
  • 17,375
  • 25
  • 118
  • 194
  • It is dupe :( `print (df.groupby('col1')['col2'].apply(list).reset_index())` – jezrael Feb 01 '18 at 06:28
  • @jezrael thanks, but now it looks like `0 [[hello, hello1, hello2]] 1 [[hello3, hello4, hello5]]` each line has two lists [[..]], ideally should be 1 [..], how should we make it as such? – william007 Feb 01 '18 at 06:39
  • What is expected output? I check solution in commnet convert to dict `print (df.groupby('col1')['col2'].apply(list).reset_index().to_dict('list'))` and this is exactly what you need. Or something missing? – jezrael Feb 01 '18 at 06:41
  • @jezrael, the output in the first comment is given this: `d = {'col1': [33, 34], 'col2': [[["hello", "hello1", "hello2"]], [["hello3", "hello4", "hello5"]]]} df = pd.DataFrame(data=d) print(df)` The expected one is given in the question. – william007 Feb 01 '18 at 06:56
  • 1
    Do you use first DataFrame as input? `d = {'col1': [33,33,33,34,34,34], 'col2': ["hello","hello1","hello2","hello3","hello4","hello5"]} df = pd.DataFrame(data=d)` ? – jezrael Feb 01 '18 at 06:56
  • @jezrael you are right! thanks! – william007 Feb 01 '18 at 07:06

0 Answers0