3

I try to convert a 10x2 array to a record, by giving names to each column.

I tried it so:

t = arange (10)
>>> n = dstack([t,
                roll (t, 1),
                roll (t, -1)])[0]
... ... >>> 
>>> n = n[:,1:3]
>>> n
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])
>>> nt = [('left', int), ('right', int)]
>>> array (n, nt)
array([[(9, 9), (1, 1)],
       [(0, 0), (2, 2)],
       [(1, 1), (3, 3)],
       [(2, 2), (4, 4)],
       [(3, 3), (5, 5)],
       [(4, 4), (6, 6)],
       [(5, 5), (7, 7)],
       [(6, 6), (8, 8)],
       [(7, 7), (9, 9)],
       [(8, 8), (0, 0)]], 
      dtype=[('left', '<i8'), ('right', '<i8')])
>>> 

To my surprize, the elements of each row are tuples instead of numbers of type int.

How can I correct this, and make each row of n look like [ 9,1 ] instead of [(9, 9), (1, 1)] ?

askewchan
  • 45,161
  • 17
  • 118
  • 134
alinsoar
  • 15,386
  • 4
  • 57
  • 74
  • 1
    possible duplicate of [Converting a 2D numpy array to a structured array](http://stackoverflow.com/questions/3622850/converting-a-2d-numpy-array-to-a-structured-array) – askewchan Sep 23 '13 at 16:06
  • I read this post, and despite my efforts to understand it and convert as I want, I cannot find the answer there. On the other hand the answers received here so far works . – alinsoar Sep 23 '13 at 16:17
  • Yeah the construction of recarrays from existing arrays is a bit weird since you're taking what used to be separate elements into one tuple of elements. – askewchan Sep 23 '13 at 16:18

3 Answers3

3

You can create a view with a new dtype and it looks a the same data:

In [150]: nt = [('left',np.int),('right',np.int)]

In [151]: n
Out[151]: 
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])

In [152]: n.view(nt)
Out[152]: 
array([[(9, 1)],
       [(0, 2)],
       [(1, 3)],
       [(2, 4)],
       [(3, 5)],
       [(4, 6)],
       [(5, 7)],
       [(6, 8)],
       [(7, 9)],
       [(8, 0)]], 
      dtype=[('left', '<i8'), ('right', '<i8')])

This maintains the 2d shape, though:

In [160]: n_struct = n.view(nt)

In [161]: n_struct.shape
Out[161]: (10, 1)

In [162]: n_struct = n.view(nt).reshape(n.shape[0])

In [163]: n_struct
Out[163]: 
array([(9, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
       (7, 9), (8, 0)], 
      dtype=[('left', '<i8'), ('right', '<i8')])

As you asked, access is as such:

In [170]: n_struct['left']
Out[170]: array([9, 0, 1, 2, 3, 4, 5, 6, 7, 8])

In [171]: n_struct['right']
Out[171]: array([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])

A warning, from @Ophion, is that this only works if the dtypes are compatible, because ndarray.view(dtype) interprets the original data as if it were the given dtype, it does not convert the data to the new given dtype. In other words, (from the documentation),

a.view(some_dtype) constructs a view of the array's memory with a different data-type. This can cause a reinterpretation of the bytes of memory.

askewchan
  • 45,161
  • 17
  • 118
  • 134
  • seems interesting, but now the array is of tuples. I need to execute this operation very often, and I need something really fast. On the other hand, how can 'left' extract the first element from an array of tuples ? – alinsoar Sep 23 '13 at 16:14
  • 1
    That is what a `record` array is: an array of tuples (since your `dtype` is a tuple of two `int`s). – askewchan Sep 23 '13 at 16:14
  • Can you define other type, such that to get directly the columnm by name, with only 1 operation ? – alinsoar Sep 23 '13 at 16:19
  • I am using python 2.7 . For me it does not work. n.view (nt) => ValueError: new type not compatible with array. – alinsoar Sep 23 '13 at 16:21
  • This is a really nice way to do this, the one thing that I would mention is to be extremely careful about the `dtype`. There is no checking and will either cause an error or wonky results if the `dtypes` do not match. @alinsoar I would check the `dtype` of your array, its likely the cause. – Daniel Sep 23 '13 at 16:31
  • @alinsoar What do you mean by "to get directly the columnm by name, with only 1 operation"? There must be two operations: one to create the record/structured array, and a second to access a column. – askewchan Sep 23 '13 at 17:24
  • @Ophion Yes of course, you wouldn't want to have it interpret the data as strings, e.g. It's important to note that this views the data and interprets at as the given type, it does not _convert_ it to the given type. – askewchan Sep 23 '13 at 17:26
  • I made an optimization , and to clean the code, I gave name to columns, to distinguish their meaning into a synamic cyclic buffer with many nodes. I did none of these both answers literaly, but I used ideas from both answers. – alinsoar Sep 23 '13 at 21:09
2

There is hopefully a better way in pure numpy, but to get you started:

>>> nt = [('left', int), ('right', int)]
>>> n
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])

>>> out = np.array(np.zeros(n.shape[0]),nt)
>>> out
array([(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0),
       (0, 0), (0, 0)],
      dtype=[('left', '<i8'), ('right', '<i8')])

>>> out['left']=n[:,0]
>>> out['right']=n[:,1]

>>> out
array([(9, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
       (7, 9), (8, 0)],
      dtype=[('left', '<i8'), ('right', '<i8')])

>>> out['left']
array([9, 0, 1, 2, 3, 4, 5, 6, 7, 8])

Of course there is the pandas answer:

>>> import pandas as pd
>>> df = pd.DataFrame(n,columns=['left','right'])
>>> df
   left  right
0     9      1
1     0      2
2     1      3
3     2      4
4     3      5
5     4      6
6     5      7
7     6      8
8     7      9
9     8      0

Something nice about pandas dataframes:

>>> df.values
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])
Daniel
  • 19,179
  • 7
  • 60
  • 74
1

If the underlying dtypes are not compatible, the view approach does not work. The fallback option is to fill the record array with a list of tuples:

In [128]: x=np.arange(12).reshape(4,3)

In [129]: y=np.zeros((4,),dtype=[('x','f'),('y','f'),('z','f')])

In [130]: y
Out[130]: 
array([(0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0)], 
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4')])

In [131]: y[:]=[tuple(row) for row in x]

In [132]: y
Out[132]: 
array([(0.0, 1.0, 2.0), (3.0, 4.0, 5.0), (6.0, 7.0, 8.0), (9.0, 10.0, 11.0)], 
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4')])

this list of tuples can be used in the initial construction:

In [135]: np.array([tuple(row) for row in x],y.dtype)
Out[135]: 
array([(0.0, 1.0, 2.0), (3.0, 4.0, 5.0), (6.0, 7.0, 8.0), (9.0, 10.0, 11.0)], 
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4')])
hpaulj
  • 221,503
  • 14
  • 230
  • 353