how do i remove rows from numpy array based on date?

Question

i have a number of arrays with the following format:

ic2 2021-05-06 09:10:30.582455  3.556000789 0
ic2 2021-05-06 09:15:30.999870  3.556000789 0
ic2 2021-05-06 09:20:31.784262  3.556000789 0
ic2 2021-05-06 09:25:32.538884  3.556000789 0
ic2 2021-05-06 09:30:32.767391  3.556000789 0
ic2 2021-05-06 09:35:33.293743  3.556000789 0
ic2 2021-05-06 09:40:33.454079  3.556000789 0
ic2 2021-05-06 09:45:33.853965  3.556000789 0
ic2 2021-05-06 09:50:34.421639  3.556000789 0
ic2 2021-05-06 09:55:34.653818  3.556000789 0
ic2 2021-05-06 10:00:35.134874  3.556000789 0

how do i remove the rows where the datetime > 2021-05-06 09:20 and < 2021-05-06 09:40 ?

I have tried with np.delete:

 raindata[10]=np.delete(raindata[10],[(raindata[10]>2021-11-07 16:42) & (raindata[10]< 2022-11-15 04:07)],axis=0)

and np.where:

raindata[10][:,1]=np.where(raindata[10][:,1]>= 2022-11-07 16:42 and <= 2022-11-15 04:07, Nan, raindata[10][:,1])

but always get the error:

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

Edit in response to comments:

for ease I have created the datetime objects with

xx=datetime.datetime(2021,5,6, hour=9, minute=20)
yy=datetime.datetime(2021,5,6, hour=9, minute=40)

and incorporated this into Ben Grossman's solution:

newraindata=np.delete(raindata[10],xx < raindata[10] < yy, axis=0)

which gives a type error '>' not supported between instances of 'str' and 'datetime.datetime'. I understand this as column 0 is a string, so I tried

newraindata=np.delete(raindata[10], (raindata[10][:,1] > xx) & (raindata[10][:,1] < yy), axis=0)

which just replicates raindata[10].

Ben also asked for code which replicates the array, I'm sorry but the array is not generated by code, it is data imported from a PGSQL table, which is why i copied the first 11 rows as an example.

Python doesn't know how to interpret "2022-11-07 16:42", you need to make that date into a datetime object. You can do this using the [strptime method](https://stackoverflow.com/questions/466345/converting-string-into-datetime) for instance. — Ben Grossmann, Feb 24 '22 at 16:59

score 1 · Accepted Answer · answered Feb 28 '22 at 09:17

This has been solved using numpy:-

set the date range:

xx=datetime(2021,11,7, hour=16, minute=42)
yy=datetime(2021,11,15, hour=4, minute=8)

create a mask:

mask2=(rainmm[:,1] > xx) & (rainmm[:,1] < yy)

create the new array:

rainmm=rainmm[~mask2].copy()

job done :)

Ben Grossmann · Answer 2 · 2022-02-26T01:30:43.410

0

It turns out that this entire problem can be nicely solved using the pandas module.

import pandas as pd

s = '''ic2 2021-05-06 09:10:30.582455  3.556000789 0
ic2 2021-05-06 09:15:30.999870  3.556000789 0
ic2 2021-05-06 09:20:31.784262  3.556000789 0
ic2 2021-05-06 09:25:32.538884  3.556000789 0
ic2 2021-05-06 09:30:32.767391  3.556000789 0
ic2 2021-05-06 09:35:33.293743  3.556000789 0
ic2 2021-05-06 09:40:33.454079  3.556000789 0
ic2 2021-05-06 09:45:33.853965  3.556000789 0
ic2 2021-05-06 09:50:34.421639  3.556000789 0
ic2 2021-05-06 09:55:34.653818  3.556000789 0
ic2 2021-05-06 10:00:35.134874  3.556000789 0'''

data = [line.split() for line in s.splitlines()]
data = [[line[0],line[1]+' '+line[2],float(line[3]),float(line[4])] for line in data]

df = pd.DataFrame(data)
df.loc[:,1] = pd.to_datetime(df.loc[:,1])
df_bool = (df[1] > pd.to_datetime('2021-05-06 09:20')) & (df[1] < pd.to_datetime('2021-05-06 09:40'))
print(df[~df_bool])

The result:

      0                          1         2    3
0   ic2 2021-05-06 09:10:30.582455  3.556001  0.0
1   ic2 2021-05-06 09:15:30.999870  3.556001  0.0
6   ic2 2021-05-06 09:40:33.454079  3.556001  0.0
7   ic2 2021-05-06 09:45:33.853965  3.556001  0.0
8   ic2 2021-05-06 09:50:34.421639  3.556001  0.0
9   ic2 2021-05-06 09:55:34.653818  3.556001  0.0
10  ic2 2021-05-06 10:00:35.134874  3.556001  0.0

edited Feb 26 '22 at 01:30

answered Feb 24 '22 at 17:04

Ben Grossmann

4,387
1
12
16

Thanks @Ben Grossman. unfortunately I can't mark your answer correct as the code does not work. I understand the concept of what you are saying, but your application doesn't work. – matt cooper Feb 25 '22 at 11:35
I had a misplaced bracket, perhaps it works now. It would also be helpful if you could add code to you question that produces (perhaps a smaller version of) the array that you're working with. – Ben Grossmann Feb 25 '22 at 12:08
edited in response – matt cooper Feb 25 '22 at 12:52
@matt If the code doesn't work, could you say what exactly goes wrong (or what error message you get)? – Ben Grossmann Feb 25 '22 at 13:01
```raindata[10]=np.delete(raindata[10], datetime.datetime(2022,11,07,hour=16,minute=42) < raindata[10] < datetime.datetime(2022,11,15,hour=4,minute=7),axis=0)``` gives ```SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers``` take out the '0' in line 2 and it gives ```TypeError: '>' not supported between instances of 'str' and 'datetime.datetime'``` – matt cooper Feb 25 '22 at 14:03
@mattcooper Thanks, I'll give it another chance when I have time. By the way, is there a reason that you're using numpy instead of pandas? – Ben Grossmann Feb 25 '22 at 15:57
@mattcooper See my latest edit! – Ben Grossmann Feb 26 '22 at 01:30
In answer to your question @Ben Grossman, the code pulls the data from a PGSQL table into a numpy array, and to change that would be too much work!. Anyhoo, i have sorted it with numpy, see my own answer, but thankyou for your help :) – matt cooper Feb 28 '22 at 09:13

how do i remove rows from numpy array based on date?

2 Answers2