10

Given that, I have a dataframe as below:

import pandas as pd
import numpy as np

dict = {
        "A": [[1,2,3,4],[3],[2,8,4],[5,8]]
}

dt = pd.DataFrame(dict)

I wish to have the Maximum and minimum of each row in column B. My favorite output is:

              A    B
0  [1, 2, 3, 4]    [1,4]
1           [3]    [3,3] 
2     [2, 8, 4]    [2,8] 
3        [5, 8]    [5,8]

What I already tried is the below code which does not work:

dt["B"] =[np.min(dt.A), np.max(dt.A)]
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Jeff
  • 7,767
  • 28
  • 85
  • 138
  • 1
    i meant, something like a loop in this question: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas however, `apply` is a good solution. – Jeff Jun 03 '20 at 10:25

4 Answers4

12

Like this:

In [1592]: dt['B'] = dt.A.apply(lambda x: [min(x), max(x)])     
In [1593]: dt                                   
Out[1593]: 
              A       B
0  [1, 2, 3, 4]  [1, 4]
1           [3]  [3, 3]
2     [2, 8, 4]  [2, 8]
3        [5, 8]  [5, 8]

As suggested by @Ch3steR, using map since it's faster:

dt['B'] = dt.A.map(lambda x: [min(x), max(x)]) 
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • 1
    `dt.A.map(...)` is a little faster than `pd.Series.apply` as `pd.Series.map` is made for element-wise calculations. And `map` was around ~20-30µs faster not much though. – Ch3steR Jun 03 '20 at 10:50
10

You can create DataFrame, then minimal and maximal values by DataFrame.agg, convert to lists and assign back if requirement is no loops (Apply are loops under the hood):

df = pd.DataFrame(dt.A.tolist())
dt['B'] = df.agg(['min','max'], axis=1).astype(int).values.tolist()
print (dt)
              A       B
0  [1, 2, 3, 4]  [1, 4]
1           [3]  [3, 3]
2     [2, 8, 4]  [2, 8]
3        [5, 8]  [5, 8]

If no problem with loops another solution with list comprehension, it should be faster like apply, depends of real data:

dt['B'] =  [[min(x), max(x)] for x in dt.A]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    Out of all the solutions `[[min(x), max(x)] for x in dt.A]` was the fastest. +1 – Ch3steR Jun 03 '20 at 10:47
  • @Ch3steR was it faster than the first solution of jez? I wouldnt expect it. – Erfan Jun 03 '20 at 11:03
  • 1
    @Erfan Yes, I `timeit` all the solutions. And *list comp* was the fastest by 10x than the second fastest solution. May be my analysis may not be fair cause I didnot test it with large lists and large DataFrame. – Ch3steR Jun 03 '20 at 11:08
  • 1
    Ah oke, I think comparison should be done on reasonably large data to see the efficiency of the methods. – Erfan Jun 03 '20 at 11:31
  • @Erfan With large data I would use some custom function which would calculate `min` and `max` in a single pass may be using `cython`. Here we are iterating twice over the same iterable when it can be done in one pass and if lists are small it wouldn't matter anyway. Testing with large data is better I agree. – Ch3steR Jun 03 '20 at 11:57
  • In the first method its not. Its all vectorized methods. Thats why I dont expect it to be slower than the loop. Actually im pretty sure it would be quite faster. – Erfan Jun 03 '20 at 11:58
  • @Erfan - I modify answer, because I think it depends of data. – jezrael Jun 03 '20 at 12:00
6

Just an alternative with explode:

dt['B'] = (dt['A'].explode().astype(int).groupby(level=0).agg(['min','max'])
           .to_numpy().tolist())
print(dt)

              A       B
0  [1, 2, 3, 4]  [1, 4]
1           [3]  [3, 3]
2     [2, 8, 4]  [2, 8]
3        [5, 8]  [5, 8]
anky
  • 74,114
  • 11
  • 41
  • 70
4

Use list comprehension on sorted values in dt.A

 dt['B']= [[row[0], row[-1]] for row in dt.A.map(lambda x: sorted(x))]

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32