concatenate row values for the same index in pandas

Question

My initial DataFrame looks as follows:

   A    B  quantity
0  1  foo         1
1  1  baz         2
2  1  bar         2
3  1  faz         1
4  2  foo         2
5  2  bar         1
6  3  foo         3

I need to group it by 'A' and make a list of 'B' multiplied by 'quantity':

   A                               B
0  1  [foo, baz, baz, bar, bar, faz]
1  2                 [foo, foo, bar]
2  3                 [foo, foo, foo]

Currently I'm using groupby() and then apply():

def itemsToList(tdf, column):

    collist = []
    for row in tdf[column].iteritems():
        collist = collist + tdf['quantity'][row[0]]*[row[1]]

    return pd.Series({column: collist})

gb = df.groupby('A').apply(itemsToList, 'B')

I doubt it is an efficient way, so I'm looking for a good, "pandaic" method to achieve this.

+1 for an interesting question. Have a look at [this](http://stackoverflow.com/questions/17841149/pandas-groupby-how-to-get-a-union-of-strings). I think it's similar. — LondonRob, Jul 06 '15 at 10:29

EdChum · Accepted Answer · 2015-07-06T11:15:40.513

5

This could be done in 2 steps, generate a new column that creates the expanded str values, then groupby on 'A' and apply list to this new column:

In [62]:
df['expand'] = df.apply(lambda x: ','.join([x['B']] * x['quantity']), axis=1)
df.groupby('A')['expand'].apply(list)

Out[62]:
A
1    [foo, baz,baz, bar,bar, faz]
2                  [foo,foo, bar]
3                   [foo,foo,foo]
Name: expand, dtype: object

EDIT

OK after taking inspiration from @Jianxun Li's answer:

In [130]:
df.groupby('A').apply(lambda x: np.repeat(x['B'].values, x['quantity']).tolist())

Out[130]:
A
1    [foo, baz, baz, bar, bar, faz]
2                   [foo, foo, bar]
3                   [foo, foo, foo]
dtype: object

Also this works:

In [131]:
df.groupby('A').apply(lambda x: list(np.repeat(x['B'].values, x['quantity'])))

Out[131]:
A
1    [foo, baz, baz, bar, bar, faz]
2                   [foo, foo, bar]
3                   [foo, foo, foo]
dtype: object

edited Jul 06 '15 at 11:15

answered Jul 06 '15 at 10:35

EdChum

376,765
198
813
562

This works, but leaves `df` different to how we found it, which I'd say is not ideal. – LondonRob Jul 06 '15 at 10:45
@LondonRob sorry what do you mean, do you mean the fact I added an intermediate column? The intermediate step is necessary as I can't figure out how to get the `apply` and `lambda` to not try to expand the list and remain 1-dimensional – EdChum Jul 06 '15 at 10:47
Yes, that's what I meant (and exactly what I'm just struggling with now! There *must* be a clever way!) – LondonRob Jul 06 '15 at 10:49
@LondonRob the problem here is that I get messed up results with the seemingly sensible ways: `df.groupby('A').apply(lambda x: list(x['B'] * x['quantity'])) ` or `df.groupby('A').apply(lambda x: ','.join(x['B'] * x['quantity']) )` – EdChum Jul 06 '15 at 10:52
@LondonRob if I try `df.groupby('A').apply(lambda x: ','.join([x['B']] * x['quantity']))` I get `Exception: Data must be 1-dimensional` so I'm racking my brain on this one – EdChum Jul 06 '15 at 10:53
Yes, that's *exactly* what I'm doing too. Can't we use [Jeff's approach](http://stackoverflow.com/a/17841294/2071807) of returning a Series? I can get pretty close using that.... – LondonRob Jul 06 '15 at 11:04
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/82483/discussion-between-londonrob-and-edchum). – LondonRob Jul 06 '15 at 11:08

Jianxun Li · Answer 2 · 2015-07-06T11:15:06.093

2

Another way to do it. First reshape the df using pivot_table and then apply np.repeat().tolist().

import pandas as pd
import numpy as np

df

Out[52]: 
   A    B  quantity
0  1  foo         1
1  1  baz         2
2  1  bar         2
3  1  faz         1
4  2  foo         2
5  2  bar         1
6  3  foo         3

df.pivot('A','B','quantity').fillna(0).apply(lambda row: np.repeat(row.index.values, row.values.astype(int)).tolist(), axis=1)

Out[53]: 
A
1    [bar, bar, baz, baz, faz, foo]
2                   [bar, foo, foo]
3                   [foo, foo, foo]
dtype: object

edited Jul 06 '15 at 11:15

answered Jul 06 '15 at 11:10

Jianxun Li

24,004
10
58
76

I used `np.repeat` to solve my `groupby` approach ;-) – EdChum Jul 06 '15 at 11:17
@EdChum Agree with your answer. `pivot_table` reshaping is unnecessary. `groupby` is a much cleaner way to do it. :-) – Jianxun Li Jul 06 '15 at 11:19

concatenate row values for the same index in pandas

2 Answers2