-1

I am wondering whether there is a neat way to 'collapse' a pandas data frame in presence of identical rows. For example:

df =

col_a  col_b
    a     1
    b     2
    b     3
    c     4
    d     5
    d     6
    d     7

what I need is:

df_new = 

col_a     col_b
    a         1
    b    [2, 3]
    c         4
    d [5, 6, 7]

it definitely should include groupby

df_new = df.groupby('col_a').apply(....)

but how to implement effectively the bit in the brackets, I'm puzzled.

Arnold Klein
  • 2,956
  • 10
  • 31
  • 60

2 Answers2

2

You can apply list to col_b:

df.groupby('col_a')['col_b'].apply(list)

col_a
a          [1]
b       [2, 3]
c          [4]
d    [5, 6, 7]
Name: col_b, dtype: object
sacuL
  • 49,704
  • 8
  • 81
  • 106
1
s = df.groupby('col_a')['col_b'].apply(list)
df['col_c'] = df['col_a'].map(s)

print(df)

col_a   col_b   col_c
0   a   1   [1]
1   b   2   [2, 3]
2   b   3   [2, 3]
3   c   4   [4]
4   d   5   [5, 6, 7]
5   d   6   [5, 6, 7]
6   d   7   [5, 6, 7]
Khalil Al Hooti
  • 4,207
  • 5
  • 23
  • 40