My data looks like this:
timedelta64 1, temp1A, temp 1B, temp1C, ...
timedelta64 2, temp2A, temp 2B, temp2C, ...
The data is ingested into two numpy arrays:
A series of times stamps
raw_timestamp
,dtype=[('datetime', '<M8[s]')]
'2009-01-01T18:41:00', '2009-01-01T18:44:00', '2009-01-01T18:46:00', '2009-01-01T18:47:00',
A table of sensor data
raw_sensor
,dtype=[ ('sensorA', '<u4'), ('sensorB', '<u4'), ('sensorC', '<u4'), ('sensorD', '<u4'), ('sensorE', '<u4'), ('sensorF', '<u4'), ('sensorG', '<u4'), ('sensorH', '<u4'), ('signal', '<u4')]
(755, 855, 755, 855, 743, 843, 743, 843, 2), (693, 793, 693, 793, 693, 793, 693, 793, 1), (755, 855, 755, 855, 743, 843, 743, 843, 2), (693, 793, 693, 793, 693, 793, 693, 793, 1),
I generate a new filled_timestamp
and fill the timestamp at every row for every time step: filled_timestamp = np.arange(np.datetime64(starttime), np.datetime64(endtime), np.timedelta64(interval))
Using idxs = np.in1d(filled_timestamp,raw_timestamp)
, I have all the indices of filled
which match with time stamps of raw
. so I can assign filled_sensor
with the matching data from raw_sensor
filled_sensor[idxs] = raw_sensor
Q1. Is there a better / faster way to intersect these?
Now filled
arrays looks like:
>>> filled_timestamp, filled_sensor # shown side-by-side for convenience
array([
1 # ('2009-01-01T18:41:00') (755, 855, 755, 855, 743, 843, 743, 843, 2),
2 # ('2009-01-01T18:42:00') (0, 0, 0, 0, 0, 0, 0, 0, 0),
3 # ('2009-01-01T18:43:00') (0, 0, 0, 0, 0, 0, 0, 0, 0),
4 # ('2009-01-01T18:44:00') (693, 793, 693, 793, 693, 793, 693, 793, 1),
5 # ('2009-01-01T18:45:00') (0, 0, 0, 0, 0, 0, 0, 0, 0),
6 # ('2009-01-01T18:46:00') (693, 793, 693, 793, 693, 793, 693, 793, 1),
7 # ('2009-01-01T18:47:00') (693, 793, 693, 793, 693, 793, 693, 793, 1)
],
dtype=[('datetime', '<M8[s]')], [('sensorA', '<u4'), ('sensorB', '<u4'), ('sensorC', '<u4'), ('sensorD', '<u4'), ('sensorE', '<u4'), ('sensorF', '<u4'), ('sensorG', '<u4'), ('sensorH', '<u4'), ('signal', '<u4')]
Q2. How can I fill the missing rows with values from the first previous non-empty row? Except column(0 and 3 and last) which is 0 for fills
In my example above:
Row 2 and 3 would take values from Row 1,
Row 5 would take values from Row 4
End result:
>>> filled_timestamp, filled_sensor # shown side-by-side for convenience
array([
1 # ('2009-01-01T18:41:00') (755, 855, 755, 855, 743, 843, 743, 843, 2),
2 # ('2009-01-01T18:42:00') (0, 855, 755, 0, 743, 843, 743, 843, 0),
3 # ('2009-01-01T18:43:00') (0, 855, 755, 0, 743, 843, 743, 843, 0),
4 # ('2009-01-01T18:44:00') (693, 793, 693, 793, 693, 793, 693, 793, 1),
5 # ('2009-01-01T18:45:00') (0, 793, 693, 0, 693, 793, 693, 793, 0),
6 # ('2009-01-01T18:46:00') (693, 793, 693, 793, 693, 793, 693, 793, 1),
7 # ('2009-01-01T18:47:00') (693, 793, 693, 793, 693, 793, 693, 793, 1)
],
dtype=[('datetime', '<M8[s]')], [('sensorA', '<u4'), ('sensorB', '<u4'), ('sensorC', '<u4'), ('sensorD', '<u4'), ('sensorE', '<u4'), ('sensorF', '<u4'), ('sensorG', '<u4'), ('sensorH', '<u4'), ('signal', '<u4')]