1

I tried to get the maximum value as well as the corresponding index of a series-object.

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

s.max() will return the maximum value whereas s.idxmax() would return the index of the maximal value. Is there a method which allows us to get the value and its corresponding index?

Thank you.

Simon
  • 123
  • 1
  • 1
  • 6

2 Answers2

2

What about a custom function? Something like

import numpy as np
import pandas as pd

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

def Max_Argmax(series):  # takes as input your series
   values = series.values  # store numeric values
   indexes = series.index  # store indexes
   Argmax = np.argmax(values)  # save index of max
   return values[Argmax], indexes[Argmax] # return max and corresponding index

(max, index) = Max_Argmax(s)

I run it on my PC and I get:

>>> s
a   -1.854440
b    0.302282
c   -0.630175
d   -1.012799
e    0.239437
dtype: float64

>>> max
0.3022819091746019

>>> index
'b'

Hope it helps!

Tommaso Di Noto
  • 1,208
  • 1
  • 13
  • 24
  • 2
    Is there a benefit in dropping down to NumPy. And, if the purpose is performance, why are we (effectively) calculating the maximum twice (a 2-pass solution) via `max` + `argmax`? – jpp Oct 30 '18 at 10:06
  • hi @jpp ! Thanks for the comment. What about now? I removed the np.max line – Tommaso Di Noto Oct 30 '18 at 12:20
  • 1
    Sure, that's better. But to see if there's any significant advantage vs `idx = s.idxmax(); val = s[idx]` [as suggested by @JonClements], I'd suggest you time it e.g. `%timeit` over a large dataframe, otherwise this seems a little verbose. – jpp Oct 30 '18 at 12:24
  • 1
    I've added in some timings in my answer below, this way is definitely faster. – Alex Oct 30 '18 at 14:02
2

As Jon Clements mentioned:

In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
In [4]: x, y = s.agg(['max', 'idxmax'])
In [5]: x
Out[5]: 1.6339096862287581
In [6]: y
Out[6]: 'b'
In [7]: s
Out[7]: a    1.245039
        b    1.633910
        c    0.619384
        d    0.369604
        e    1.009942
        dtype: float64

In response to asking for a tuple:

def max_and_index(series):
    """Return a tuple of (max, idxmax) from a pandas.Series"""
    x, y = series.agg(['max', 'idxmax'])
    return x, y

t = max_and_idxmax(s)
print(t)
(1.6339096862287581, 'b')
print(type(t))
<class 'tuple'>

Even smaller:

def max_and_idxmax(series):
    """Return a tuple of (max, idxmax) from a pandas.Series"""
    return series.max(), series.idxmax()

If you need speed, use the numpy method above

import pandas as pd
import numpy as np


def max_and_index(series):
    x, y = series.agg(['max', 'idxmax'])
    return x, y

def max_and_idxmax(series):
    return series.max(), series.idxmax()

def np_max_and_argmax(series):
    return np.max(series.values), np.argmax(series.values)

def Max_Argmax(series):
   v = series.values
   i = series.index
   arg = np.argmax(v)
   return v[arg], i[arg]


a = []
for i in range(2,9,1):
    a.append(pd.Series(np.random.randint(0, 100, size=10**i)))
    print('{}\t{:>11,}'.format(i-2, 10**i))

# 0            100
# 1          1,000
# 2         10,000
# 3        100,000
# 4      1,000,000
# 5     10,000,000
# 6    100,000,000

idx = 5
%%timeit -n 2 -r 10
max_and_index(a[idx])
# 144 ms ± 5.45 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)

%%timeit -n 2 -r 10
max_and_idxmax(a[idx])
# 143 ms ± 5.14 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)

%%timeit -n 2 -r 10
Max_Argmax(a[idx])
# 9.89 ms ± 1.13 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)

%%timeit -n 2 -r 10
np_max_and_argmax(a[idx])
# 24.5 ms ± 1.74 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)
Alex
  • 6,610
  • 3
  • 20
  • 38