57

The answer will be very obvious I think, but I don't see it at the moment.

How can I convert a record array back to a regular ndarray?

Suppose I have following simple structured array:

x = np.array([(1.0, 4.0,), (2.0, -1.0)], dtype=[('f0', '<f8'), ('f1', '<f8')])

then I want to convert it to:

array([[ 1.,  4.],
       [ 2., -1.]])

I tried asarray and astype, but that didn't work.

UPDATE (solved: float32 (f4) instead of float64 (f8))

OK, I tried the solution of Robert (x.view(np.float64).reshape(x.shape + (-1,)) ), and with a simple array it works perfectly. But with the array I wanted to convert it gives a strange outcome:

data = np.array([ (0.014793682843446732, 0.006681123282760382, 0.0, 0.0, 0.0, 0.0008984912419691682, 0.0, 0.013475529849529266, 0.0, 0.0),
       (0.014793682843446732, 0.006681123282760382, 0.0, 0.0, 0.0, 0.0008984912419691682, 0.0, 0.013475529849529266, 0.0, 0.0),
       (0.014776384457945824, 0.006656022742390633, 0.0, 0.0, 0.0, 0.0008901208057068288, 0.0, 0.013350814580917358, 0.0, 0.0),
       (0.011928378604352474, 0.002819152781739831, 0.0, 0.0, 0.0, 0.0012627150863409042, 0.0, 0.018906937912106514, 0.0, 0.0),
       (0.011928378604352474, 0.002819152781739831, 0.0, 0.0, 0.0, 0.001259754877537489, 0.0, 0.01886274479329586, 0.0, 0.0),
       (0.011969991959631443, 0.0028706740122288465, 0.0, 0.0, 0.0, 0.0007433745195157826, 0.0, 0.011164642870426178, 0.0, 0.0)], 
      dtype=[('a_soil', '<f4'), ('b_soil', '<f4'), ('Ea_V', '<f4'), ('Kcc', '<f4'), ('Koc', '<f4'), ('Lmax', '<f4'), ('malfarquhar', '<f4'), ('MRN', '<f4'), ('TCc', '<f4'), ('Vcmax_3', '<f4')])

and then:

data_array = data.view(np.float).reshape(data.shape + (-1,))

gives:

In [8]: data_array
Out[8]: 
array([[  2.28080997e-20,   0.00000000e+00,   2.78023241e-27,
          6.24133580e-18,   0.00000000e+00],
       [  2.28080997e-20,   0.00000000e+00,   2.78023241e-27,
          6.24133580e-18,   0.00000000e+00],
       [  2.21114197e-20,   0.00000000e+00,   2.55866881e-27,
          5.79825816e-18,   0.00000000e+00],
       [  2.04776835e-23,   0.00000000e+00,   3.47457730e-26,
          9.32782857e-17,   0.00000000e+00],
       [  2.04776835e-23,   0.00000000e+00,   3.41189244e-26,
          9.20222417e-17,   0.00000000e+00],
       [  2.32706550e-23,   0.00000000e+00,   4.76375305e-28,
          1.24257748e-18,   0.00000000e+00]])

which is an array with other numbers and another shape. What did I do wrong?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
joris
  • 133,120
  • 36
  • 247
  • 202
  • np.asanyarray(x) will maintain the complex dtype for each column else np.array(x.tolist()) – diliop May 10 '11 at 23:05
  • You need to replace `np.float` with `data.dtype[0]`. Please, update your question posting the solution at the end, so it is more clear for the reader. – Atcold Apr 01 '16 at 19:38

5 Answers5

48

The simplest method is probably

x.view((float, len(x.dtype.names)))

(float must generally be replaced by the type of the elements in x: x.dtype[0]). This assumes that all the elements have the same type.

This method gives you the regular numpy.ndarray version in a single step (as opposed to the two steps required by the view(…).reshape(…) method.

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • 1
    I would add an improvement: `x.view((x.dtype[0], len(x.dtype.names)))`. So, one can even define a function that does this, since everything is parametrised. – Atcold Apr 01 '16 at 19:32
  • You mentioned that the *Cookbook* has a few efficient methods. Would you mind pointing out **where**. I am not sure how to find them. – Atcold Apr 01 '16 at 19:34
  • The link that was in the answer became invalid. I cannot find the page it was originally referring to. – Eric O. Lebigot Apr 02 '16 at 13:37
  • Oh, OK then. I would still recommend replacing `float` with `x.dtype[0]`. – Atcold Apr 02 '16 at 19:07
  • 4
    I've been using this method for a while now; unfortunately now I'm getting "FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array. This code may break in numpy 1.13 because this will return a view instead of a copy -- see release notes for details." Any thoughts how to deal with that? – bbengfort May 20 '17 at 17:06
  • I don't foresee any real problem. You just need to be aware of the change (which is actually [slated for NumPy 1.14](https://github.com/numpy/numpy/blob/master/doc/release/1.13.0-notes.rst#future-changes)) so as to make sure that your code does not break. If the warning bothers you, you can [silence it](https://github.com/numpy/numpy/blob/master/doc/release/1.13.0-notes.rst#future-changes). – Eric O. Lebigot May 22 '17 at 13:18
  • 8
    This does not work for me in Numpy 1.14. Assume I have the following structured array: `arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b', 'f4'), ('c', 'f4'), ('d', 'f4')])`. Then trying to convert the 4-tuple inside into a regular array via `out = arr[0].view((np.float32, len(arr.dtype.names)))` results in `ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged`. – Alex Apr 25 '18 at 17:18
  • Have you tried what this response suggests, namely `arr.view((np.float32, len(arr.dtype.names))`? (If you want the first line of 4 floats, you can _append_ `[0]`.) This works with Numpy 1.13.3, on my machine. – Eric O. Lebigot Apr 26 '18 at 13:50
  • Does not work for mixed datatypes ... – Chris Dec 01 '21 at 08:19
  • Indeed, as indicated in the answer. – Eric O. Lebigot Dec 02 '21 at 15:22
34
[~]
|5> x = np.array([(1.0, 4.0,), (2.0, -1.0)], dtype=[('f0', '<f8'), ('f1', '<f8')])

[~]
|6> x.view(np.float64).reshape(x.shape + (-1,))
array([[ 1.,  4.],
       [ 2., -1.]])
Robert Kern
  • 13,118
  • 3
  • 35
  • 32
  • Thanks! I suppose this doesn't make a copy of the array? – joris May 11 '11 at 06:57
  • 1
    @joris: Your array contains single-precision (32 bit) floating point numbers. To reinterpret the same memory as an unstructured array, use `.view(np.float32)` in the above code. – Sven Marnach May 11 '11 at 10:21
  • @joris, correct, it does not make a copy. It is just a view on top of the memory in the original array. – Robert Kern May 11 '11 at 15:15
  • 1
    There is no need for a tuple construction: `reshape(x.shape + (-1,))` can be simplified as `reshape(x.shape, -1)`. I updated the answer. – Eric O. Lebigot Apr 16 '12 at 09:08
  • @EOL I consider that an undocumented misfeature that makes the code less consistent and harder to understand. Please consider reverting. – Robert Kern Apr 17 '12 at 17:29
  • 1
    @RobertKern: It is indeed not documented in the `reshape()` documentation string. However, it is used in many places in the official documentation, so I take it it is quite official (one can for instance find many instances of `reshape(i, j, k)` at http://scipy.org/Numpy_Example_List). I asked the NumPy community to clarify this (http://projects.scipy.org/numpy/ticket/2110). – Eric O. Lebigot Apr 18 '12 at 04:49
  • @EOL I still consider `reshape(i,j,k)` to be an inconsistent misfeature to avoid, but it's been around long enough that I don't militate for its removal. I've never seen `reshape(some_tuple, j)` before in code or in documentation. In any case, I do not wish to appear as if I am recommending it. – Robert Kern Apr 18 '12 at 09:33
  • @EOL Actually `reshape(some_tuple, -1)` simply doesn't work. I don't know what version of numpy you are using in which it does work. I am reverting. – Robert Kern Apr 18 '12 at 09:36
  • @RobertKern: Right, the "(tuple, int)" form does not work. In fact, I meant what I wrote in the comment: "reshape(int, int,…)", which would be here `x.view(np.float64).reshape(len(x), -1))`. I think that this is more legible than the tuple concatenation. However, I will leave your original post as is, because of the absence of a clear cut recommendation on the use of `reshape()` with integer arguments. – Eric O. Lebigot Apr 18 '12 at 10:51
18

In conjunction with changes on how it handle multi-field indexing numpy has provided two new functions that can help in converting to/from structured arrays:

In numpy.lib.recfunctions, these are structured_to_unstructured and unstructured_to_structured. repack_fields is another new function.

From the 1.16 release notes

multi-field views return a view instead of a copy

Indexing a structured array with multiple fields, e.g., arr[['f1', 'f3']], returns a view into the original array instead of a copy. The returned view will often have extra padding bytes corresponding to intervening fields in the original array, unlike before, which will affect code such as arr[['f1', 'f3']].view('float64'). This change has been planned since numpy 1.7. Operations hitting this path have emitted FutureWarnings since then. Additional FutureWarnings about this change were added in 1.12.

To help users update their code to account for these changes, a number of functions have been added to the numpy.lib.recfunctions module which safely allow such operations. For instance, the code above can be replaced with structured_to_unstructured(arr[['f1', 'f3']], dtype='float64'). See the “accessing multiple fields” section of the user guide.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Here's a link to the [accessing multiple fields](https://docs.scipy.org/doc/numpy/user/basics.rec.html?highlight=accessing%20multiple%20fields#accessing-multiple-fields) section of the user guide. – djvg Sep 12 '19 at 10:46
  • 1
    This is a better answer than the selected one. Using `.view(float)` doe not allow to extract and merge a *subset of fields*, whereas `structured_to_unstructured` does – jeromerg Dec 17 '19 at 16:25
  • 1
    Best answer! It works fine `from numpy.lib.recfunctions import structured_to_unstructured` `structured_to_unstructured(X)` – kabhel Aug 17 '20 at 10:15
  • Great answer. This feels like the most "official" way of doing this. Great for reading point cloud data from a PLY file using the `plyfile` library, for example. – Ray Jun 28 '23 at 14:53
15
np.array(x.tolist())
array([[ 1.,  4.],
      [ 2., -1.]])

but maybe there is a better method...

Andrea Zonca
  • 8,378
  • 9
  • 42
  • 70
  • 7
    This is slow, as you first convert an efficiently packed NumPy array to a regular Python list. The official method is much faster (see my answer). – Eric O. Lebigot Apr 16 '12 at 09:01
  • This is the easiest to remember... It's surprising there is no `x.toArray()` method... – Atcold Apr 01 '16 at 19:13
  • 1
    Don't make python lists when you don't have to, they're much more expensive. – RBF06 May 25 '18 at 12:58
  • 3
    This is indeed slow, but the only answer that works reliably. The other ones do not work for me (numpy 1.14.x). – Jan Christoph Terasa Oct 29 '18 at 09:14
  • Efficiency/speed is not always that important. IMHO this solution is much more readable than the efficient-yet-obscure `view` approach. – djvg Nov 21 '18 at 13:11
  • 3
    This method works when the structured array has multiple data types, the other methods fail under this situation. Very useful for unit tests where speed isn't important, just getting a comparison to work is. – David Parks Aug 12 '19 at 04:58
0

A very simple solution using the function rec2array of root_numpy:

np_array = rec2array(x)

root_numpy is actually deprecated but the rec2array code is useful anyway (source here):

def rec2array(rec, fields=None):

  simplify = False

  if fields is None:
      fields = rec.dtype.names
  elif isinstance(fields, string_types):
      fields = [fields]
      simplify = True

  # Creates a copy and casts all data to the same type
  arr = np.dstack([rec[field] for field in fields])

  # Check for array-type fields. If none, then remove outer dimension.
  # Only need to check first field since np.dstack will anyway raise an
  # exception if the shapes don't match
  # np.dstack will also fail if fields is an empty list
  if not rec.dtype[fields[0]].shape:
      arr = arr[0]

  if simplify:
      # remove last dimension (will be of size 1)
      arr = arr.reshape(arr.shape[:-1])

  return arr
Nicola
  • 621
  • 10
  • 22