0

If I want to combine vectors of different dtypes to a two dimensional numpy array, I can use either a

  1. numpy structured array or a
  2. numpy record array.

When should I use 1., when should I use 2.? Do they behave identically with regards to performance, convenience?

The record array is created with less code, but does the structured array have some other advantages that makes it preferable over the other?

Code example:

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
       ['2018-04-01T15:31:00'],
       ['2018-04-01T15:32:00'],
       ['2018-04-01T15:33:00'],
       ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
  1. structured array:

(see: How to insert column of different type to numpy array? )

# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))

# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)

# populate the structured array with the data from your column arrays
struct['date'], struct['val'] = a.T, c.T

print(struct)
# output:
#     array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
#            ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
#            ('2018-04-01T15:34:00', 4)],
#           dtype=[('date', '<M8[s]'), ('val', '<i8')])
  1. record array:

(see How can I avoid that np.datetime64 gets auto converted to datetime when adding it to a numpy array? )

rarr = np.rec.fromarrays([a, c], names=('date', 'val'))

print(rarr)
# output
#     rec.array([[('2018-04-01T15:30:00', 0)],
#                [('2018-04-01T15:31:00', 1)],
#                [('2018-04-01T15:32:00', 2)],
#                [('2018-04-01T15:33:00', 3)],
#                [('2018-04-01T15:34:00', 4)]],
#               dtype=[('date', '<M8[s]'), ('val', '<i8')])
user7468395
  • 1,299
  • 2
  • 10
  • 23
  • Sorry, didn't mean to confuse you by throwing both structured and record arrays at you. My own preference is to always use structured arrays, but, like you say, the syntax can be more verbose. The above linked question has some really good info. [This answer here](https://stackoverflow.com/a/51280608/425458) is my own personal favorite take on the struct vs rec topic. Nice and succinct. – tel Jan 21 '19 at 02:32
  • Recarray is a structured array that lets you access fields as attributes. – hpaulj Jan 21 '19 at 02:51
  • @tel: no, seeing both variants helped to understand much :-) ... the big advantage for structured arrays seems to be that I can filter for certain entries with: `myIndex = np.where(struct["date"] == np.datetime64('2018-04-01T15:31:00'))[0]; print(struct["val"][myIndex][0])`. This does not seem to be possible with recarrays. – user7468395 Jan 21 '19 at 03:25
  • 1
    Your `rarr` is (5,1), while `struct` is (5,). Otherwise they should behave the same. I usually use structured arrays, but in this case the convenience of `fromarrays` makes recarray attractive. – hpaulj Jan 21 '19 at 04:47

0 Answers0