If I want to combine vectors of different dtypes to a two dimensional numpy array, I can use either a
- numpy
structured array
or a - numpy
record array
.
When should I use 1., when should I use 2.? Do they behave identically with regards to performance, convenience?
The record array
is created with less code, but does the structured array
have some other advantages that makes it preferable over the other?
Code example:
import numpy as np
a = np.array([['2018-04-01T15:30:00'],
['2018-04-01T15:31:00'],
['2018-04-01T15:32:00'],
['2018-04-01T15:33:00'],
['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
structured array
:
(see: How to insert column of different type to numpy array? )
# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))
# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)
# populate the structured array with the data from your column arrays
struct['date'], struct['val'] = a.T, c.T
print(struct)
# output:
# array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
# ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
# ('2018-04-01T15:34:00', 4)],
# dtype=[('date', '<M8[s]'), ('val', '<i8')])
record array
:
(see How can I avoid that np.datetime64 gets auto converted to datetime when adding it to a numpy array? )
rarr = np.rec.fromarrays([a, c], names=('date', 'val'))
print(rarr)
# output
# rec.array([[('2018-04-01T15:30:00', 0)],
# [('2018-04-01T15:31:00', 1)],
# [('2018-04-01T15:32:00', 2)],
# [('2018-04-01T15:33:00', 3)],
# [('2018-04-01T15:34:00', 4)]],
# dtype=[('date', '<M8[s]'), ('val', '<i8')])