NumPy: get min/max from record array of numeric values

Question

I have a NumPy record array of floats:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
               (238.02, 238.0, 237.01),
               (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])

How can I determine min/max from this record array? My usual attempt of ar.min() fails with:

TypeError: cannot perform reduce with flexible type

I'm not sure how to flatten the values out into a simpler NumPy array.

score 5 · Accepted Answer · edited May 23 '17 at 11:45

5

The easiest and most efficient way is probably to view your array as a simple 2D array of floats:

ar_view = ar.view((ar.dtype[0], len(ar.dtype.names)))

which is a 2D array view on the structured array:

print ar_view.min(axis=0)  # Or whatever…

This method is fast, as no new array is created (changes to ar_view result in changes to ar). It is restricted to cases like yours, though, where all record fields have the same type (float32, here).

One advantage is that this method keeps the 2D structure of the original array intact: you can find the minimum in each "column" (axis=0), for instance.

edited May 23 '17 at 11:45

Community

1
1

answered Jul 04 '12 at 05:58

Eric O. Lebigot

91,433
48
218
260

I get an Error with `float`: "ValueError: new type not compatible with array." However, if I use a NumPy float data type like `ar.dtype[0]` (or `dtype('float32')`), success! – Mike T Jul 05 '12 at 23:55
1

`ar.view((ar.dtype[0], len(ar.dtype)))` – Mike T Jul 06 '12 at 00:00
1

I guess now we would use [structured_to_unstructured](https://numpy.org/doc/stable/user/basics.rec.html#numpy.lib.recfunctions.structured_to_unstructured)? – djvg Nov 25 '21 at 19:44
This is an interesting comment. Note that `structured_to_unstructured` creates a _new_ array and is therefore not fully equivalent to this answer (and is slower). – Eric O. Lebigot Nov 26 '21 at 20:08

nye17 · Answer 2 · 2012-07-04T01:35:03.397

3

you can do

# construct flattened ndarray
arnew = np.hstack(ar[r] for r in ar.dtype.names)

to flatten the recarray, then you can perform your normal ndarray operations, like

armin, armax = np.min(arnew), np.max(arnew)
print(armin),
print(armax)

the results are

237.0 238.05

basically ar.dtype.names gives you the list of recarray names, then you retrieve the array one by one from the names and stack to arnew

edited Jul 04 '12 at 01:35

answered Jul 04 '12 at 01:26

nye17

12,857
11
58
68

`np.hstack()` is useful if the different fields of the structured array do not have the same type, which is not the case here. For this question, the `view()` approach (see my answer) is way faster, and also has the advantage of keeping the 2D structure of the original array intact. – Eric O. Lebigot Jul 04 '12 at 06:03
1

@EOL yep, I thought the op wanted a flattened ndarray so I suggested him use `hstack()`, but otherwise if the dtypes are uniform and only min/max are needed, sure, `view` is a lot lot better. – nye17 Jul 04 '12 at 14:49

score 2 · Answer 3 · answered Jun 02 '15 at 03:22

This may help someone else down the line, but another way to do it that may be more sensible:

import numpy as np
ar = np.array([(238.03, 238.0, 237.0),
              (238.02, 238.0, 237.01),
              (238.05, 238.01, 237.0)], 
              dtype=[('A', 'f'), ('B', 'f'), ('C', 'f')])
arView = ar.view(np.recarray)
arView.A.min()

which allowed me to just pick and choose. A problem on my end was that the dtype for all my elements were not the same (a rather complicated struct by and large).

score 0 · Answer 4 · answered Apr 17 '22 at 10:27

A modern approach could leverage pandas to read and process the record array, then convert back to NumPy:

import pandas as pd

# read record array as a data frame, process data
df = pd.DataFrame(ar)
df_min = df.min(axis=0)

# convert to a uniform array
df_min.to_numpy()
# array([238.02, 238.  , 237.  ], dtype=float32)

# convert to a record array
df_min.to_frame().T.to_records(index=False)
# rec.array([(238.02, 238., 237.)],
#           dtype=[('A', '<f4'), ('B', '<f4'), ('C', '<f4')])

NumPy: get min/max from record array of numeric values

4 Answers4

Linked