1

In this question (Get year, month or day from numpy datetime64) an example on how to get year, month and day from a numpy datetime64 can be found.

One of the answers uses:

dates = np.arange(np.datetime64('2000-01-01'), np.datetime64('2010-01-01'))
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = dates.astype('datetime64[M]').astype(int) % 12 + 1
days = dates - dates.astype('datetime64[M]') + 1

Also notice that:

To get integers instead of timedelta64[D] in the example for days above, use: (dates - dates.astype('datetime64[M]')).astype(int) + 1

How could the hours, minutes and seconds be extracted?

As stated in the comment to return integers, I would like to get integers too.

Edit:

Jérôme's answer is useful but I am still struggling to properly understand how do I reach the safe point of having datetime64[s] as input data.

In my actual situation this is what I have once I read the CSV in Pandas:

print(df['date'])

print(type(df['date']))

print(df['date'].dtype)

​

0         2018-12-31 23:59:00
1         2018-12-31 23:58:00
2         2018-12-31 23:57:00
3         2018-12-31 23:56:00
4         2018-12-31 23:55:00
                 ...         
525594    2018-01-01 00:05:00
525595    2018-01-01 00:04:00
525596    2018-01-01 00:03:00
525597    2018-01-01 00:02:00
525598    2018-01-01 00:01:00
Name: date, Length: 525599, dtype: object
<class 'pandas.core.series.Series'>
object

So how could I convert df['dates'] into a dates variable which is datetime64[s] and then apply the solution provided?

M.E.
  • 4,955
  • 4
  • 49
  • 128
  • 2
    But those dates don't have any time. They're just dates... ? –  Jan 29 '22 at 02:15
  • In the real case (which I tried to simplify -maybe wrongly- to get an easily replicable example), I am trying to extract the individual fields from a YYYY-MM-DD HH:MM:SS date column imported via CSV into a Pandas dataframe. So I have `dates = df['date'].values.astype('datetime64[D]')` instead of `np.arange(np.datetime64('2000-01-01'), np.datetime64('2010-01-01'))` – M.E. Jan 29 '22 at 02:24

1 Answers1

1

In your example, the type of the array is np.datetime64[D] so the hours/minutes/seconds are not stored in the items. However, the np.datetime64[s] does this.

Here is how to extract the information from a np.datetime64[s]-typed array:

# dates = array(['2009-08-29T23:44:31', 
#                '2017-12-17T05:47:37'],
#               dtype='datetime64[s]')
dates = np.array([
    np.datetime64(1251589471, 's'), 
    np.datetime64(1513489657, 's')
])

Y, M, D, h, m, s = [dates.astype('datetime64[%s]' % kind) for kind in 'YMDhms']

years = Y.astype(int) + 1970
months = M.astype(int) % 12 + 1
days = (D - M).astype(int) + 1
hours = (h - D).astype(int)
minutes = (m - h).astype(int)
seconds = (s - m).astype(int)

# [array([2009, 2017]),
#  array([ 8, 12], dtype=int32),
#  array([29, 17]),
#  array([23,  5]),
#  array([44, 47]),
#  array([31, 37])])
print([years, months, days, hours, minutes, seconds])
Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • This is really helpful, however, I am still struggling to end with the right input data -see edit made to the original question-. How could I convert `df['dates']` into a `dates` variable which is `datetime64[s]` and then apply the solution provided? – M.E. Jan 29 '22 at 23:49
  • 1
    Following this answer (https://stackoverflow.com/questions/70733239/how-do-i-import-a-column-as-datetime-date) I can get as input data a datetime64 object and apply this answer. – M.E. Jan 30 '22 at 00:00
  • Very good. Do you have any way to extend this to Day of Week? (i.e. 1 thru 7) – user2299067 Aug 28 '23 at 12:55
  • Update: This works with the caveat that Thursday is Day 1 of the week, for some unknown reason. ```Y, M, W, D, h, m, s = [dates.astype('datetime64[%s]' % kind) for kind in 'YMWDhms'] d_of_week = (D-W).astype(int)+1``` – user2299067 Aug 28 '23 at 13:05