How to merge rows in a pandas dataframe that share the same designated index entry?

Question

I have a dataframe with two columns, with string entries, eg:

Col1 Col2
E1     K1
E2     K2
E3     K3
...

I want to create an index on Col2 but the entries there are not unique. So I would like to merge all rows of shared entries in Col2. Say K2, K3 are identical. Then after the merge what I would like to have is the Col1 values serialized into a list forming just one row, ie:

Col1 Col2
E1     K1
[E2, E3]     K2
...

What would be the cleanest way of achieving this?

@Onyambu k2 == k3; the entries are all strings; sorry for the lack of clarity — gen, Jul 19 '18 at 23:41
then you should just use k2 in both cases. no need of k3. replace k3 ith k2 — Onyambu, Jul 19 '18 at 23:42
_why_ do you want to do this? is this for reporting/human viewability? if so, that's ok, but if it's for data analysis, it's almost certainly the wrong approach. — acushner, Jul 19 '18 at 23:42
Possible duplicate of [Pandas groupby: How to get a union of strings](https://stackoverflow.com/questions/17841149/pandas-groupby-how-to-get-a-union-of-strings) — gen, Jul 20 '18 at 17:59

score 1 · Answer 1 · answered Jul 19 '18 at 23:49

1

First change K3 into K2 then groupby column2 and aggregate on it

 df[df=='K3']='K2'
 df.groupby('Col2').agg(list)
          Col1
Col2          
K1        [E1]
K2    [E2, E3]

answered Jul 19 '18 at 23:49

Onyambu

67,392
3
24
53

See my comment under Wen’s answer. – gen Jul 20 '18 at 04:05

score 1 · Answer 2 · answered Jul 20 '18 at 00:04

1

You can using mapper with groupby

df.set_index('Col2').groupby(by={'K1':'K1','K2':'K2','K3':'K2'},axis=0).Col1.apply(list)
Out[116]: 
K1        [E1]
K2    [E2, E3]
Name: Col1, dtype: object

answered Jul 20 '18 at 00:04

BENY

317,841
20
164
234

Perhaps my original post was misleading, but the table above was just an example. A priori I no way of knowing which indices occur repeatedly, and in particular the dataframe has a variable number if rows. – gen Jul 20 '18 at 04:04

score 0 · Answer 3 · answered Jul 24 '18 at 17:49

0

since you don't know a priori what the dupes are, you can do:

df.groupby('Col2').agg(lambda s: list(s))

for some reason .agg(list) doesn't work for me...

answered Jul 24 '18 at 17:49

acushner

9,595
1
34
34

How to merge rows in a pandas dataframe that share the same designated index entry?

3 Answers3