How to get maximum and minimum of a list in column?

Question

Given that, I have a dataframe as below:

import pandas as pd
import numpy as np

dict = {
        "A": [[1,2,3,4],[3],[2,8,4],[5,8]]
}

dt = pd.DataFrame(dict)

I wish to have the Maximum and minimum of each row in column B. My favorite output is:

              A    B
0  [1, 2, 3, 4]    [1,4]
1           [3]    [3,3] 
2     [2, 8, 4]    [2,8] 
3        [5, 8]    [5,8]

What I already tried is the below code which does not work:

dt["B"] =[np.min(dt.A), np.max(dt.A)]

i meant, something like a loop in this question: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas however, `apply` is a good solution. — Jeff, Jun 03 '20 at 10:25

Mayank Porwal · Accepted Answer · 2020-06-03T11:05:51.063

12

Like this:

In [1592]: dt['B'] = dt.A.apply(lambda x: [min(x), max(x)])     
In [1593]: dt                                   
Out[1593]: 
              A       B
0  [1, 2, 3, 4]  [1, 4]
1           [3]  [3, 3]
2     [2, 8, 4]  [2, 8]
3        [5, 8]  [5, 8]

As suggested by @Ch3steR, using map since it's faster:

dt['B'] = dt.A.map(lambda x: [min(x), max(x)])

edited Jun 03 '20 at 11:05

answered Jun 03 '20 at 10:18

Mayank Porwal

33,470
8
37
58

1

`dt.A.map(...)` is a little faster than `pd.Series.apply` as `pd.Series.map` is made for element-wise calculations. And `map` was around ~20-30µs faster not much though. – Ch3steR Jun 03 '20 at 10:50

jezrael · Answer 2 · 2020-06-03T11:59:29.437

10

You can create DataFrame, then minimal and maximal values by DataFrame.agg, convert to lists and assign back if requirement is no loops (Apply are loops under the hood):

df = pd.DataFrame(dt.A.tolist())
dt['B'] = df.agg(['min','max'], axis=1).astype(int).values.tolist()
print (dt)
              A       B
0  [1, 2, 3, 4]  [1, 4]
1           [3]  [3, 3]
2     [2, 8, 4]  [2, 8]
3        [5, 8]  [5, 8]

If no problem with loops another solution with list comprehension, it should be faster like apply, depends of real data:

dt['B'] =  [[min(x), max(x)] for x in dt.A]

edited Jun 03 '20 at 11:59

answered Jun 03 '20 at 10:22

jezrael

822,522
95
1,334
1,252

2

Out of all the solutions `[[min(x), max(x)] for x in dt.A]` was the fastest. +1 – Ch3steR Jun 03 '20 at 10:47
@Ch3steR was it faster than the first solution of jez? I wouldnt expect it. – Erfan Jun 03 '20 at 11:03
1

@Erfan Yes, I `timeit` all the solutions. And *list comp* was the fastest by 10x than the second fastest solution. May be my analysis may not be fair cause I didnot test it with large lists and large DataFrame. – Ch3steR Jun 03 '20 at 11:08
1

Ah oke, I think comparison should be done on reasonably large data to see the efficiency of the methods. – Erfan Jun 03 '20 at 11:31
@Erfan With large data I would use some custom function which would calculate `min` and `max` in a single pass may be using `cython`. Here we are iterating twice over the same iterable when it can be done in one pass and if lists are small it wouldn't matter anyway. Testing with large data is better I agree. – Ch3steR Jun 03 '20 at 11:57
In the first method its not. Its all vectorized methods. Thats why I dont expect it to be slower than the loop. Actually im pretty sure it would be quite faster. – Erfan Jun 03 '20 at 11:58
@Erfan - I modify answer, because I think it depends of data. – jezrael Jun 03 '20 at 12:00

score 6 · Answer 3 · answered Jun 03 '20 at 10:31

Just an alternative with explode:

dt['B'] = (dt['A'].explode().astype(int).groupby(level=0).agg(['min','max'])
           .to_numpy().tolist())
print(dt)

              A       B
0  [1, 2, 3, 4]  [1, 4]
1           [3]  [3, 3]
2     [2, 8, 4]  [2, 8]
3        [5, 8]  [5, 8]

wwnde · Answer 4 · 2020-06-03T11:15:09.087

4

Use list comprehension on sorted values in dt.A

 dt['B']= [[row[0], row[-1]] for row in dt.A.map(lambda x: sorted(x))]

edited Jun 03 '20 at 11:15

answered Jun 03 '20 at 10:31

wwnde

26,119
6
18
32

How to get maximum and minimum of a list in column?

4 Answers4