0

There is a df with 2 columns

goods_id         int64
properties_id    int64
dtype: object

df
      goods_id  properties_id
    0   3588    1
    1   3588    2
    2   3588    3
    3   3588    4
    4   3588    5
    5   3588    6
    6   3589    1
    7   3589    2
    8   3589    3

Need to combine properties_ids rows into the list of integers for each group. The other words, desired output for each group_id 3588 [1,2,3,4,5,6], 3589 [1,2,3] etc. To get it I use self written combine function based on concatenation via ','.join. The result isn't what I expected to get. Cannot understand the behavior of result

def combine(x):
    return ','.join(x)

df.groupby('goods_id').apply(combine)

goods_id
3588    goods_id,properties_id # desired output [1,2,3,4,5,6]
3589    goods_id,properties_id # desired output [1,2,3]

Using df.groupby('goods_id')['properties_id'].apply(combine) gives me the TypeError: sequence item 0: expected str instance, int found

Ivan Shelonik
  • 1,958
  • 5
  • 25
  • 49

1 Answers1

1

In one line:

df.groupby('goods_id').agg(lambda col: col.tolist()).reset_index()

Gives the following dataframe:

   goods_id       properties_id
0      3588  [1, 2, 3, 4, 5, 6]
1      3589           [1, 2, 3]

If you have more columns in your dataframe, they will also be aggregated to lists. If this is the case and you only want properties_id to become a list, you just need to specify this column in .agg():

df.groupby('goods_id').agg({'properties_id': lambda col: col.tolist()}).reset_index()
asongtoruin
  • 9,794
  • 3
  • 36
  • 47
  • Thanks, all works great. But why my method gives unexpected results with aggregating column names instead of their values? I'll be able to accep answer in some minutes – Ivan Shelonik Jun 20 '18 at 11:27