13

I'm sure this has been asked before, sorry if duplicate. Suppose I have the following dataframe:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data': range(6)}, columns=['key', 'data'])

>>
    key data
0   A   0
1   B   1
2   C   2
3   A   3
4   B   4
5   C   5

Doing a groupby on 'key', df.groupby('key').sum() I know we can do things like:

>> 
    data
key 
A   3
B   5
C   7

What is the easiest way to get all the 'splitted' data in an array?:

>> 
    data
key 
A   [0, 3]
B   [1, 4]
C   [2, 5]

I'm not necessarily grouping by just one key, but with several other indexes as well ('year' and 'month' for example) which is why I'd like to use the groupby function, but preserve all the grouped values in an array.

ru111
  • 813
  • 3
  • 13
  • 27

1 Answers1

26

You can use apply(list):

print(df.groupby('key').data.apply(list).reset_index())

  key    data
0   A  [0, 3]
1   B  [1, 4]
2   C  [2, 5]
anky
  • 74,114
  • 11
  • 41
  • 70
  • 2
    For arrays instead of lists you can do `df.groupby('key').data.apply(np.array)` which was more convenient for my operations. – ru111 May 20 '19 at 13:45
  • What is one has multiple-columns and wants aggregate all the values from multiple columns into one list? – Moondra Jun 30 '20 at 22:24
  • 1
    @Moondra `df.groupby("Column Name").agg(list)` should help.. another way is pivot table (not required though) `df.pivot_table(index="Column Name",aggfunc=list)` – anky Jul 01 '20 at 03:45
  • This is what worked for me as I needed distinct list/array items: df.groupby('key').data.unique().reset_index() – ChrisDanger Aug 18 '21 at 18:37
  • does this preserve the itens order in the resulting list? – Maviles Feb 21 '22 at 22:27
  • 2
    Hey, I am getting the error ```Error:'DataFrameGroupBy' object has no attribute 'data'```. My line of code is ```main_group = main.groupby(["new-date", 'seller_identifier', 'affiliate_name']).data.apply(np.array).reset_index()``` Any solutions to this? – Nischaya Sharma Aug 16 '22 at 05:48
  • 1
    @NischayaSharma remove the .data from your code and try – anky Aug 17 '22 at 04:24