It's possible, but this isn't the sort of thing numpy is good at. One possible solution is to pad the array with nan
and use np.nanmax
like so
import numpy as np
def pad_array(arr):
M = max(len(a) for a in arr)
return np.array([a + [np.nan] * (M - len(a)) for a in arr])
data = [
[-20],
[-23],
[-41],
[1, 2, 3],
[2, 3],
[5, 6, 7, 8, 9],
]
arr = pad_array(data)
# array([[-20., nan, nan, nan, nan],
# [-23., nan, nan, nan, nan],
# [-41., nan, nan, nan, nan],
# [ 1., 2., 3., nan, nan],
# [ 2., 3., nan, nan, nan],
# [ 5., 6., 7., 8., 9.]])
np.nanmin(arr, axis=1) #array([-20., -23., -41., 1., 2., 5.])
np.nanmax(arr, axis=1) #array([-20., -23., -41., 3., 3., 9.])
This isn't faster than a regular list comprehension, though. np.min
and np.max
are working, but numpy doesn't have support for ragged arrays so np.array(data)
is making a one-dimensional array of objects, and np.min
is giving you the smallest object
--the same as you would get if you had used Python's builtin min
function--the same goes with np.max
.
Here are the timings comparing creating a padded array and using a plain list comprehension
%%timeit
arr = np.array(pad_array(data))
np.nanmin(arr, axis=1)
10000 loops, best of 3: 27 µs per loop
%timeit [min(row) for row in data]
1000000 loops, best of 3: 1.26 µs per loop
This is a bit contrived because I use a list comprehension and a generator expression in pad_array
so it stands to reason that a single list comprehension is going to be faster, but if you were in a situation where you only needed to create the padded array once, a single list comprehension would still be faster.
%timeit np.nanmin(arr, axis=1)
100000 loops, best of 3: 13.3 µs per loop
EDIT:
You could use np.vectorize
to make a vectorized version of Python's builtin max
and min
functions
vmax = np.vectorize(max)
vmax(data) #array([-20, -23, -41, 3, 3, 9])
It's still not faster than a list comprehension ...
%timeit vmax(data)
10000 loops, best of 3: 25.6 µs per loop
EDIT 2
For the sake of completeness/correctness, it is worth pointing out the the numpy solution will scale better than the pure Python list comprehension solution. Suppose we had 6 million rows instead of 6 and needed to perform multiple element-wise operations, numpy would be better. For example, if we have
data = [
[-20],
[-23],
[-41],
[1, 2, 3],
[2, 3],
[5, 6, 7, 8, 9],
] * 1000000
arr = pad_array(data) #this takes ~6 seconds
The timings are much more in favor of numpy
%timeit [min(row) for row in data]
1 loops, best of 3: 1.05 s per loop
%timeit np.nanmin(arr, axis=1)
10 loops, best of 3: 111 ms per loop