0

Alright, syntactically I dont know how to do this - I have a dataframe set up like this:

target   type    post
1      intj    "hello world shdjd"
2      entp    "hello world fddf"
16     estj   "hello world dsd"
4      esfp    "hello world sfs"
1      intj    "hello world ddfd"

where there are 16 types that repeat for something like 10,000 rows. The posts are unique.

I need to concatenate all the posts that have the same type (or target - target is just the type number 1-16). Looked at Pandas groupby category, rating, get top value from each category? and the groupBy method, however I dont know how to do this with strings.

Ive tried (dataframe is called result):

result = result.reset_index()
# print(result.loc[result.groupby('type').post.agg('idxmax')])
print(result.loc[result.groupby('type').post.str.cat(sep=' ')])

But neither work. How can I concatenate by same type?

EXPECTED OUTPUT:

target   type    post
    1      intj    "all intj posts concatenated .. "
    2      entp    "all entp posts concatenated .. "
    3      estj   "all estj  posts concatenated .. "
    4      esfp    "all esfp  posts concatenated .. "
    5      infj    "all infj posts concatenated .. "
    16     istj    "all istj posts concatenated .. "
desertnaut
  • 57,590
  • 26
  • 140
  • 166
blue
  • 7,175
  • 16
  • 81
  • 179

2 Answers2

0

This would do the trick

df['post'] = df.groupby(['target','type'])['post'].transform(lambda x: ','.join(x)).drop_duplicates()

Carlos P Ceballos
  • 384
  • 1
  • 7
  • 20
0

Try this:

print(df.groupby(by=['type', 'target'])['post'].agg(lambda col: ''.join(col)))

type  target
entp  2                          hello world fddf
esfp  4                           hello world sfs
estj  16                          hello world dsd
intj  1         hello world shdjdhello world ddfd
NYC Coder
  • 7,424
  • 2
  • 11
  • 24