0

I am trying to bin a dataset consisting of a long time series of measurements into a number of discrete bins. The times that the measurements are made are held in a numpy array of datetime objects t_data.

I generate the bin edges as an array of datetime objects as well t_edges.

When I print out both arrays their contents display as a series of datetime.datetime(...) items.

I then try to assign each measurement in t_data to the relevant bin using:

t_bin = np.digitize(t_data, t_edges)

However, this results in the following error:

  File "<__array_function__ internals>", line 5, in digitize
  File "python3.9/site-packages/numpy/lib/function_base.py", line 4922, in digitize
    mono = _monotonicity(bins)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

It seems to be an issue with the datatypes, but I have done some searching and am not sure how to correct this. It seems that one is being classed as an 'object', 'O', whilst the other is a float? Reading the error message I note that the data series and bins should all be increasing monotonically, but perhaps being a datetime confuses this? I am aware of this question that seems to have a similar issue with datetime64, but did not receive an answer.

If anyone can give me something to try and resolve this to make it work (or tell me if it is impossible to use np.digitize() with datetime series) I would be grateful.

Minimal working example:

from datetime import datetime, timedelta
import numpy as np

sdate = datetime.strptime('2017-01-01 18:00:00', "%Y-%m-%d %H:%M:%S")
edate = datetime.strptime('2017-01-01 18:00:30', "%Y-%m-%d %H:%M:%S")

t_data = np.array([sdate + timedelta(minutes=x) for x in range((edate - sdate).seconds)])

t_edges = np.array([datetime.strptime('2017-01-01 18:00:00', "%Y-%m-%d %H:%M:%S"),
                   datetime.strptime('2017-01-01 18:00:10',"%Y-%m-%d %H:%M:%S"),
                   datetime.strptime('2017-01-01 18:00:20', "%Y-%m-%d %H:%M:%S")])

t_bin = np.digitize(t_data, t_edges)

I'd be expecting the result to be of the form [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, ....., 3, 3, 3, 3, 3]

FluidFox
  • 107
  • 10

1 Answers1

1

Seems like Numpy is treating your datetime objects as mere object.

I would suggest casting your datetime objects to timestamps before applying np.digitize.

Example:

from datetime import datetime, timedelta
import numpy as np

sdate = datetime.strptime('2017-01-01 18:00:00', "%Y-%m-%d %H:%M:%S")
edate = datetime.strptime('2017-01-01 18:00:30', "%Y-%m-%d %H:%M:%S")

t_data = np.array([(sdate + timedelta(seconds=x)) for x in range((edate - sdate).seconds)])

t_edges = np.array([datetime.strptime('2017-01-01 18:00:00', "%Y-%m-%d %H:%M:%S"),
                   datetime.strptime('2017-01-01 18:00:10',"%Y-%m-%d %H:%M:%S"),
                   datetime.strptime('2017-01-01 18:00:20', "%Y-%m-%d %H:%M:%S")])

t_data_ts = [datetime.timestamp(t) for t in t_data]
t_edges_ts = [datetime.timestamp(t) for t in t_edges]

t_bin = np.digitize(t_data_ts, t_edges_ts)

There were some bugs in your code that I fixed.

FluidFox
  • 107
  • 10
  • Brilliant, thanks! This is good for the minimal example. Is there a quick way to convert an entire array of date times to timestamps (```t_data``` is very large and read in from file)? – FluidFox Nov 02 '21 at 20:30
  • How is the data stored in the file? Can you post a minimal example of the file? – Frederik Rogalski Nov 02 '21 at 21:31
  • It's a netcdf generated by writing a datetime array from a different script. Is there a quick, pythonic way to convert an array of datetimes to timestamps, as I'd rather not do it as a loop over each element. – FluidFox Nov 02 '21 at 21:38
  • I am guessing you are using netCDF4.num2date() somewhere in your script. You could just append ".timestamp()" to it. But this is just a guess since I was unaware of netcdf until 5 minutes ago. – Frederik Rogalski Nov 02 '21 at 21:55
  • No, they are read in direct as they were previously saved to netcdf as a datetime. I have decided to just convert each item in the array using a loop which is not ideal but seems the only way as far as I can see. If you can accept the edit showing this I'll accept the answer. :) – FluidFox Nov 03 '21 at 10:22