How can I select the latest minute values in Numpy?

Question

I have a numpy array that looks like this:

>>> array_data
array([[datetime.datetime(2017, 10, 24, 1, 3, 45, 104000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 3, 47, 901000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 3, 56, 214000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 8, 11000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 13, 120000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 15, 714000), 50, 4],
   [datetime.datetime(2017, 10, 24, 1, 4, 16, 214000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 27, 323000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 5, 13, 261000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 5, 56, 276000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 6, 0, 886000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 6, 38, 104000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 6, 38, 995000), 50, -1],
   [datetime.datetime(2017, 10, 24, 1, 6, 42, 511000), 51, 5],
   [datetime.datetime(2017, 10, 24, 1, 7, 4, 714000), 50, 5],
   [datetime.datetime(2017, 10, 24, 1, 7, 12, 823000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 7, 17, 229000), 50, -1],
   [datetime.datetime(2017, 10, 24, 1, 7, 45, 948000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 7, 56, 245000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 8, 10, 761000), 50, -1],
   [datetime.datetime(2017, 10, 24, 1, 8, 21, 464000), 50, -3],
   [datetime.datetime(2017, 10, 24, 1, 8, 21, 761000), 50, -1]], dtype=object)

If it's updated real-time, how can I select the latest minute of data whenever it's updated? For example, if current time was 2017, 10, 24, 1, 7, 17, 229000, I want it to print out (50*5)+(50*1)+(50*-1), and if it was 2017, 10, 24, 1, 7, 45, 948000, it will print out (50*5)+(50*1)+(50*-1)+(50*1).

I thought that I could extract the minute value of the latest-updated row whenever it's updated and make a loop that goes backwards and compare their minute values to each other until they are not the same. However, I think it will be resource-consuming when there are many rows within a minute and when the updating is faster than the loop procedure. Are there more effective ways to do this?

While the question is not entirely clear to me, if you are looking for keeping track of real-time updates that are chronological, a `queue` data structure may be more appropriate, or even an array that sorts on insert. — crazyGamer, Oct 25 '17 at 14:13
@crazyGamer Thank you for your comment. Simply put, I want to know the range of the latest minute data. As for the array above, it will be `[-3:]`. I want to plot the data as well, so I think I need to stick to a numpy array. — maynull, Oct 25 '17 at 14:21
Alright, so my question is then: Do you want to keep *all* minute data records, or only the latest `n` (say latest 4) in the numpy array? — crazyGamer, Oct 26 '17 at 04:33
@crazyGamer I want to keep all data, and at the same time, want to track the latest range of one minute data real-time and interpret it. For example, I want it to make a signal if the sum of the values within the latest minute(it could be shorter than one minute) is bigger than 600. — maynull, Oct 26 '17 at 05:15

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

I suggest using pandas.

You create a dataframe from your numpy array with

df = pd.DataFrame(array_data[:, 1:],
                  index=array_data[:, 0], columns=['a', 'b'])

or crate new and add rows with

df = pd.DataFrame(columns=['a', 'b'])
df.loc[datetime.datetime.now()] = [0, 1]

Then you can create a datetime without seconds and use it for slicing

>>> d
datetime.datetime(2017, 10, 24, 1, 8, 21, 761000)
>>> dm = datetime.datetime(d.year, d.month, d.day, d.hour, d.minute)
>>> dm
datetime.datetime(2017, 10, 24, 1, 8)
>>> df[dm:]
                          a   b
2017-10-24 01:08:10.761  50  -1
2017-10-24 01:08:21.464  50  -3
2017-10-24 01:08:21.761  50  -1

You are using index, so it is efective.

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 25 '17 at 15:32

pacholik

8,607
9
43
55

Thank you for the answer, but I'm trying to use a numpy array to store real-time data. I think it's not appropriate to use pandas when dealing with real-time big data? – maynull Oct 26 '17 at 00:07
@maynull Well, pandas [is build](https://stackoverflow.com/a/11077215/1028589) on top of numpy… – pacholik Oct 26 '17 at 08:10

How can I select the latest minute values in Numpy?

1 Answers1

I suggest using pandas.