0

I have a numpy array that looks like this:

>>> array_data
array([[datetime.datetime(2017, 10, 24, 1, 3, 45, 104000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 3, 47, 901000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 3, 56, 214000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 8, 11000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 13, 120000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 15, 714000), 50, 4],
   [datetime.datetime(2017, 10, 24, 1, 4, 16, 214000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 4, 27, 323000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 5, 13, 261000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 5, 56, 276000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 6, 0, 886000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 6, 38, 104000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 6, 38, 995000), 50, -1],
   [datetime.datetime(2017, 10, 24, 1, 6, 42, 511000), 51, 5],
   [datetime.datetime(2017, 10, 24, 1, 7, 4, 714000), 50, 5],
   [datetime.datetime(2017, 10, 24, 1, 7, 12, 823000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 7, 17, 229000), 50, -1],
   [datetime.datetime(2017, 10, 24, 1, 7, 45, 948000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 7, 56, 245000), 50, 1],
   [datetime.datetime(2017, 10, 24, 1, 8, 10, 761000), 50, -1],
   [datetime.datetime(2017, 10, 24, 1, 8, 21, 464000), 50, -3],
   [datetime.datetime(2017, 10, 24, 1, 8, 21, 761000), 50, -1]], dtype=object)

If it's updated real-time, how can I select the latest minute of data whenever it's updated? For example, if current time was 2017, 10, 24, 1, 7, 17, 229000, I want it to print out (50*5)+(50*1)+(50*-1), and if it was 2017, 10, 24, 1, 7, 45, 948000, it will print out (50*5)+(50*1)+(50*-1)+(50*1).

I thought that I could extract the minute value of the latest-updated row whenever it's updated and make a loop that goes backwards and compare their minute values to each other until they are not the same. However, I think it will be resource-consuming when there are many rows within a minute and when the updating is faster than the loop procedure. Are there more effective ways to do this?

maynull
  • 1,936
  • 4
  • 26
  • 46
  • 1
    While the question is not entirely clear to me, if you are looking for keeping track of real-time updates that are chronological, a `queue` data structure may be more appropriate, or even an array that sorts on insert. – crazyGamer Oct 25 '17 at 14:13
  • @crazyGamer Thank you for your comment. Simply put, I want to know the range of the latest minute data. As for the array above, it will be `[-3:]`. I want to plot the data as well, so I think I need to stick to a numpy array. – maynull Oct 25 '17 at 14:21
  • Alright, so my question is then: Do you want to keep *all* minute data records, or only the latest `n` (say latest 4) in the numpy array? – crazyGamer Oct 26 '17 at 04:33
  • @crazyGamer I want to keep all data, and at the same time, want to track the latest range of one minute data real-time and interpret it. For example, I want it to make a signal if the sum of the values within the latest minute(it could be shorter than one minute) is bigger than 600. – maynull Oct 26 '17 at 05:15

1 Answers1

1

I suggest using pandas.

You create a dataframe from your numpy array with

df = pd.DataFrame(array_data[:, 1:],
                  index=array_data[:, 0], columns=['a', 'b'])

or crate new and add rows with

df = pd.DataFrame(columns=['a', 'b'])
df.loc[datetime.datetime.now()] = [0, 1]

Then you can create a datetime without seconds and use it for slicing

>>> d
datetime.datetime(2017, 10, 24, 1, 8, 21, 761000)
>>> dm = datetime.datetime(d.year, d.month, d.day, d.hour, d.minute)
>>> dm
datetime.datetime(2017, 10, 24, 1, 8)
>>> df[dm:]
                          a   b
2017-10-24 01:08:10.761  50  -1
2017-10-24 01:08:21.464  50  -3
2017-10-24 01:08:21.761  50  -1

You are using index, so it is efective.

Community
  • 1
  • 1
pacholik
  • 8,607
  • 9
  • 43
  • 55
  • Thank you for the answer, but I'm trying to use a numpy array to store real-time data. I think it's not appropriate to use pandas when dealing with real-time big data? – maynull Oct 26 '17 at 00:07
  • @maynull Well, pandas [is build](https://stackoverflow.com/a/11077215/1028589) on top of numpy… – pacholik Oct 26 '17 at 08:10