2

I want to convert a structured NumPy array with datetime64[m] and a timedelta64[m] fields to an equivalent structured array with seconds since the epoch.

The field size in an np.array is important for converting an unstructured array into a structured array. (Convert a numpy array to a structured array)

Since the current np.datetime64 field is longer than an int field for seconds since the epoch converting the array in place is not possible - right? (I would prefer this option.)

The simple and wrong approach would be this:

import numpy as np
import numpy.lib.recfunctions as rf

datetime_t = np.dtype([("start", "datetime64[m]"),
                       ("duration", "timedelta64[m]"),
                       ("score", float)])

seconds_t = np.dtype([("start", "int"),
                      ("duration", "int"),
                      ("score", float)])


unstructured = np.arange(9).reshape((3, 3))
print(unstructured)

datetime_structure = rf.unstructured_to_structured(unstructured, dtype=datetime_t)
print(datetime_structure)

seconds_structure = datetime_structure.astype(seconds_t)
print(seconds_structure.dtype)

giving me this output:

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[('1970-01-01T00:00', 1, 2.) ('1970-01-01T00:03', 4, 5.)
 ('1970-01-01T00:06', 7, 8.)]
[(0, 1, 2.) (3, 4, 5.) (6, 7, 8.)]

Process finished with exit code 0

Since I specified minutes, I should get multiple of 60 seconds, not single digits.

Sidenote: I am confused by the first conversion TO the DateTime format, as the DateTime is not in minutes but in seconds. I specified datetime64[m] and converted 3 (and 0 and 6) into that format, and I would have expected 3 minutes ('1970-01-01T03:00'), not 3 seconds ('1970-01-01T00:03'). Oh well. Perhaps someone could explain?

How do I convert a structured array like this elegantly and efficiently? Do I need to iterate over the array manually (my real array has a few more fields than this example), copy the columns one by one, and convert the time fields? Given that I want to convert multiple different structures containing these time formats, a generalized approach to converting these fields in structured arrays would be welcome without needing to specify the fields individually.

Andreas Schuldei
  • 343
  • 1
  • 15
  • I am guessing the type you called "seconds_t" is not really interpreted as seconds by numpy – Stef Jan 06 '23 at 13:43
  • Perhaps related: [Convert numpy array of seconds to minutes and seconds?](https://stackoverflow.com/questions/55112906/convert-numpy-array-of-seconds-to-minutes-and-seconds) ; [How do I convert seconds to hours, minutes and seconds?](https://stackoverflow.com/questions/775049/how-do-i-convert-seconds-to-hours-minutes-and-seconds) – Stef Jan 06 '23 at 13:48
  • Related: [How to get unix timestamp from numpy.datetime64?](https://stackoverflow.com/questions/11865458/how-to-get-unix-timestamp-from-numpy-datetime64) [pandas datetime to unix timestamp seconds?](https://stackoverflow.com/questions/54313463/pandas-datetime-to-unix-timestamp-seconds) [convert numpy.datetime64 into epoch time?](https://stackoverflow.com/questions/57208280/convert-numpy-datetime64-into-epoch-time) [numpy datetime64 from unix utc seconds?](https://stackoverflow.com/questions/15053791/numpy-datetime64-from-unix-utc-seconds) – Stef Jan 06 '23 at 14:15
  • Use 'datetime64[s]' if you want seconds. – hpaulj Jan 06 '23 at 16:13
  • @hpaulj, how can I change the dtype from minutes to seconds after the fact? – Andreas Schuldei Jan 06 '23 at 18:49
  • @Stef, that would then require looping over the list of fields of the array, like I described, correct? Part of my question is how to do that elegantly and in a general way with different dtype combinations. Writing different conversion functions for the different arrays, each with a list of fields feels ugly. (I want to convert an existing project to a new time format, and I have several of those conversions, that i want to do gradually.) – Andreas Schuldei Jan 06 '23 at 18:54
  • *"that would then require looping over the list of fields of the array, like I described, correct?"* Sorry, I have no idea what you're referring to. – Stef Jan 06 '23 at 21:19

1 Answers1

0

This is the way I am doing it now. I define two alternative data types:

  1. one with datetime64[s] and
  2. one with int,

and convert first to seconds and then to int like this:

import numpy as np

minutes_dt = np.dtype([("time", "datetime64[m]"),
                      ("duration", "timedelta64[m]")])

seconds_dt = np.dtype([("time", "datetime64[s]"),
                      ("duration", "timedelta64[s]")])

basic_dt = np.dtype([("time", int),
                      ("duration", int)])


minutes = np.array([(10,8)], dtype=minutes_dt)
print(minutes)

seconds = minutes.astype(seconds_dt)
print(seconds)

print(minutes.astype(basic_dt), seconds.astype(basic_dt))

which gives me this output:

[('1970-01-01T00:10', 8)]
[('1970-01-01T00:10:00', 480)]
[(10, 8)] [(600, 480)]
Andreas Schuldei
  • 343
  • 1
  • 15