0

I want to combine rows in a pandas dataframe where a particular column contains the same value, and all the other columns, values are put into a single list.

Here's a method I came up with. First, create the sample df

df = pd.DataFrame({'Animal': ['Falcon', 'Parrot',
                              'Parrot', 'Falcon'],
                   'Max Speed': ['380.', '370.', '24.', '26.']})
df

output


Animal  Max Speed
0   Falcon  380.
1   Parrot  370.
2   Parrot  24.
3   Falcon  26.

Here is the method

test = df.groupby(['Animal']).agg(lambda x: tuple(x)).applymap(list).reset_index()
test.head()

output


Animal  Max Speed
0   Falcon  [380., 26.]
1   Parrot  [370., 24.]

Is there a more computationally efficient method for getting the same output?

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
  • why do you create a tuple then list? just call list on your groupby method `df.groupby('Animal',as_index=False)['Max Speed'].agg(list)`? – Umar.H Feb 17 '21 at 11:05
  • df.groupby(['Animal']).agg(list) seems to work – SantoshGupta7 Feb 17 '21 at 11:08
  • Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) – Umar.H Feb 17 '21 at 11:09
  • see answer by `cs95` – Umar.H Feb 17 '21 at 11:09
  • is apply(list) more efficient than agg(list)? – SantoshGupta7 Feb 17 '21 at 11:14
  • 1
    in general we should avoid `apply` as it operates on a row by row level - even tho it's using numpy and Cython behind the scenes it's still v.slow compared to api methods. I assume `agg(list)` is more computationaly efficient as it will operate on a whole set as opposed to a row by row basis. – Umar.H Feb 17 '21 at 11:15

0 Answers0