Group by a column, another column concatenate

Asked Feb 01 '18 at 06:08

Active Nov 25 '19 at 14:27

Viewed 39 times

For the following dataframe,

d = {'col1': [33,33,33,34,34,34], 'col2': ["hello","hello1","hello2","hello3","hello4","hello5"]}
df = pd.DataFrame(data=d)
print(d)

I want it to group by col1 and the content in col2 are concatenate as a list, the result is as followed:

import pandas as pd
d = {'col1': [33,34], 'col2': [["hello","hello1","hello2"],["hello3","hello4","hello5"]]}
df = pd.DataFrame(data=d)
print(d)

Is there an easy way to achieve that?

edited Nov 25 '19 at 14:27

lczapski

asked Feb 01 '18 at 06:08

william007

It is dupe :( `print (df.groupby('col1')['col2'].apply(list).reset_index())` – jezrael Feb 01 '18 at 06:28
@jezrael thanks, but now it looks like `0 [[hello, hello1, hello2]] 1 [[hello3, hello4, hello5]]` each line has two lists [[..]], ideally should be 1 [..], how should we make it as such? – william007 Feb 01 '18 at 06:39
What is expected output? I check solution in commnet convert to dict `print (df.groupby('col1')['col2'].apply(list).reset_index().to_dict('list'))` and this is exactly what you need. Or something missing? – jezrael Feb 01 '18 at 06:41
@jezrael, the output in the first comment is given this: `d = {'col1': [33, 34], 'col2': [[["hello", "hello1", "hello2"]], [["hello3", "hello4", "hello5"]]]} df = pd.DataFrame(data=d) print(df)` The expected one is given in the question. – william007 Feb 01 '18 at 06:56
1

Do you use first DataFrame as input? `d = {'col1': [33,33,33,34,34,34], 'col2': ["hello","hello1","hello2","hello3","hello4","hello5"]} df = pd.DataFrame(data=d)` ? – jezrael Feb 01 '18 at 06:56
@jezrael you are right! thanks! – william007 Feb 01 '18 at 07:06

0 Answers0