-2

I'm beginner of pandas so I have a question below. There's a lot of answers about groupby rows but I can't find the answer what I want.

anyway my datatable is below.

   COLUMN1    COLUMN2   COLUMN3
0  APPLE      RED       JOHN, JANE
1  BANANA     YELLOW    SMITH
1  BANANA     YELLOW    EMILY
2  GRAPE      VIOLET    JESSICA
2  GRAPE      VIOLET    REIRA
2  GRAPE      VIOLET    EMMA
2  GRAPE      PURPLE    JOE
2  GRAPE      PURPLE    LISA
3  MELON      GREEN     RIO
3  MELON      GREEN     REIRA
.. 

and I want to get this table. (edit : EXCEPT YELLOW)

   COLUMN1    COLUMN2   COLUMN3
0  APPLE      RED       JOHN, JANE
1  BANANA     YELLOW    SMITH
1  BANANA     YELLOW    EMILY
2  GRAPE      VIOLET    JESSICA, REIRA, EMMA
2  GRAPE      PURPLE    JOE, LISA
3  MELON      GREEN     RIO, REIRA
.. 

How can I get this? Please give me a hint or answer then I'll appreciate a lot.. thank you.

now
  • 15
  • 3
  • Does this answer your question? [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby) – wavingtide Nov 18 '22 at 00:29
  • I don't think so. I think that the page you provide me is different of index of result but thank you. – now Nov 18 '22 at 00:32
  • need code of example for answer. when comma(,) is value of dataframe, need code. – Panda Kim Nov 18 '22 at 00:39
  • I don't understand what you want. Yes comma(,) is value of my real datatable. And I want to connect parts of column3 datas with commna(,) Then what you need is datatable code? – now Nov 18 '22 at 00:46
  • yeah, make code of example dataframe. – Panda Kim Nov 18 '22 at 00:54
  • your exmaple is not reproducible. To answer your question, answerer create example himself like below answer. chk post and make code. https://stackoverflow.com/help/minimal-reproducible-example – Panda Kim Nov 18 '22 at 01:05
  • Actually, what's in the text is not my real datable. It's hard to write an example code because it's just an example. I wrote a code similar to the body datatable, but I'm not sure if it's what I want. I don't know how to keep comma(,) in datatable. Anyway, it exists in my actual dataable. – now Nov 18 '22 at 01:05
  • import pandas as pd df = pd.DataFrame({'col1': ['Apple', 'Banana', 'Grape'], 'col2': ['Red', 'Yellow', 'Violet'], 'col3':['John, Jane', 'Smith, Emily', 'Jessica, Reira']}) split = df.col3.str.split(',') split = split.apply(lambda x: pd.Series(x)) split = split.stack().reset_index(level=1, drop=True).to_frame('new_col3') split = df.merge(split, left_index=True, right_index=True, how='left') split = split.drop(['col3'], axis=1) – now Nov 18 '22 at 01:06

2 Answers2

0
import pandas as pd
df = pd.DataFrame({'col1': ['Apple', 'Banana', 'Banana', 'Grape', 'Grape', 'Grape', 'Apple'], 'col2': ['Red', 'Yellow', 'Yellow', 'Violet', 'Violet', 'Purple', 'Red'], 'col3':['John, Jane', 'Smith', 'Emily', 'Jecica', 'Reira', 'Joe', 'Rio']})
df2 = df.groupby(['col1', 'col2'])['col3'].apply(list).reset_index()
df2['col3'] = df2['col3'].apply(lambda x: ', '.join(map(str, x)))
df2

To avoid splitting Yellow of Banana,

import pandas as pd
df = pd.DataFrame({'col1': ['Apple', 'Banana', 'Banana', 'Grape', 'Grape', 'Grape', 'Apple'], 'col2': ['Red', 'Yellow', 'Yellow*', 'Violet', 'Violet', 'Purple', 'Red'], 'col3':['John, Jane', 'Smith', 'Emily', 'Jecica', 'Reira', 'Joe', 'Rio']})
df2 = df.groupby(['col1', 'col2'])['col3'].apply(list).reset_index()
df2['col3'] = df2['col3'].apply(lambda x: ', '.join(map(str, x)))
df2['col2'] = df2['col2'].replace('Yellow*', 'Yellow')
df2
Ricardo
  • 691
  • 3
  • 11
  • Wow, Thank you so much. That's similar with what I want, but I found that order of data is twisted and I want to except to split Banana. How can I do this? – now Nov 18 '22 at 00:55
  • Please note the `df` I used is slightly different from what you provided for simplicity. To avoid splitting, the main idea is to change one `Yellow` to `Yellow*` before `groupby`, and change it back later – Ricardo Nov 18 '22 at 01:10
  • Wow It's worked so thanks I will study so hard so I want to person like you Thanks – now Nov 18 '22 at 01:10
0

must make reproducible example for answer

EXAMPLE

data = [['APPLE', 'RED', 'JOHN, JANE'],
        ['BANANA', 'YELLOW', 'SMITH'],
        ['BANANA', 'YELLOW', 'EMILY'],
        ['GRAPE', 'VIOLET', 'JESSICA'],
        ['GRAPE', 'VIOLET', 'REIRA'],
        ['GRAPE', 'VIOLET', 'EMMA'],
        ['GRAPE', 'PURPLE', 'JOE'],
        ['GRAPE', 'PURPLE', 'LISA'],
        ['MELON', 'GREEN', 'RIO'],
        ['MELON', 'GREEN', 'REIRA']]
df = pd.DataFrame(data, index=[0, 1, 1, 2, 2, 2, 2, 2, 3, 3], columns=['col1', 'col2', 'col3'])

output(df):

    col1    col2    col3
0   APPLE   RED     JOHN, JANE
1   BANANA  YELLOW  SMITH
1   BANANA  YELLOW  EMILY
2   GRAPE   VIOLET  JESSICA
2   GRAPE   VIOLET  REIRA
2   GRAPE   VIOLET  EMMA
2   GRAPE   PURPLE  JOE
2   GRAPE   PURPLE  LISA
3   MELON   GREEN   RIO
3   MELON   GREEN   REIRA



First
make col3 to list

df1 = df.groupby([df.index, 'col1', 'col2']).agg(list).reset_index()

output(df1):

    level_0 col1    col2    col3
0   0       APPLE   RED     [JOHN, JANE]
1   1       BANANA  YELLOW  [SMITH, EMILY]
2   2       GRAPE   PURPLE  [JOE, LISA]
3   2       GRAPE   VIOLET  [JESSICA, REIRA, EMMA]
4   3       MELON   GREEN   [RIO, REIRA]



Second
join col3 except yellow and explode yellow

df1.assign(col3=df1.apply(lambda x: ','.join(x['col3']) if x['col2'] != 'YELLOW' else x['col3'] , axis=1)).explode('col3')

result:

    level_0 col1    col2    col3
0   0       APPLE   RED     JOHN, JANE
1   1       BANANA  YELLOW  SMITH
1   1       BANANA  YELLOW  EMILY
2   2       GRAPE   PURPLE  JOE,LISA
3   2       GRAPE   VIOLET  JESSICA,REIRA,EMMA
4   3       MELON   GREEN   RIO,REIRA

if you want level_0 to index, use set_index or set_axis

and at next time make reproducible exmaple for answer.

Panda Kim
  • 6,246
  • 2
  • 12
  • omg It's amazing answer omg really thank you so much. Thank you so much for your sincere answer I'll tried this answer, too. Have a nice day man – now Nov 18 '22 at 01:24
  • I'll keep your advice in mind – now Nov 18 '22 at 01:24
  • I had no idea that indexes could be specified directly! wow I'm glad you taught me one more thing. – now Nov 18 '22 at 01:25
  • It is very troublesome to make example by myself and answer it. and you can make example. – Panda Kim Nov 18 '22 at 01:26