3

What I am trying to do is convert my date to datetime64[D]. At source - Some of the dates are object type and some of the dates are datetime64[ns]. I am not asking how to do the conversion - I know it. But something's happening while I create a new column, and following code seems to have no impact and datetime64[ns] doesnt change.

 df2['date'].values.astype('datetime64[D]')

This is the sample dataframe:

d = {'date' :['2015-10-05 15:08:43', '2015-10-05 19:17:12', '2015-10-06 15:51:22', '2015-10-06 19:39:18', '2015-10-06 19:58:06', '2015-12-18 11:09:01'], 'name': ['john', 'tom', 'phill', 'nero', 'bob', 'rob']}
df2 = pd.DataFrame(data = d)

date in df2 is object type. When we do the following

df2['date'] = pd.to_datetime(df2['date'])

date becomes dtype: datetime64[ns].

Now following code works and produces datetime64[D] output

df2['date'].values.astype('datetime64[D]')

But when I create a new column, it goes back to

df2['date'] = df2['date'].values.astype('datetime64[D]')

See the output here -

Name: date, dtype: datetime64[ns]

So, my question is that why is it not working when I am creating a new column ?

Note: I know that last line produces warning. So I also tried below method but its not producing datetime64[D]

newcol = df2['date'].values.astype('datetime64[D]')
df2.assign(date = newcol)
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
singularity2047
  • 951
  • 4
  • 18
  • 28
  • I don't get it. Where is anything reverting back to `object`? – roganjosh May 22 '18 at 18:48
  • If you're referring to why the dtype if `datetime64[ns]` instead of `datetime64[D]` I found this other useful SO link https://stackoverflow.com/questions/31917964/python-numpy-cannot-convert-datetime64ns-to-datetime64d-to-use-with-numba – Orenshi May 22 '18 at 18:50
  • date goes back to datetime64[ns] even after using df2['date'].values.astype('datetime64[D]') @roganjosh – singularity2047 May 22 '18 at 18:52
  • Just seen the link you've been given. I had no idea about that at all. More wonky datetime fun with pandas :) – roganjosh May 22 '18 at 18:53
  • @Orenshi I have seen that link - but not solving my problem. – singularity2047 May 22 '18 at 18:53
  • From what I can infer in the other link, it seems like because you're assigning it back to a column (aka a new column in a DataFrame is a Pandas Series), you're kind of stuck with `datetime64[ns]` :( – Orenshi May 22 '18 at 18:58
  • Oops, that's awful. Same thing was happening when I tried df2['date'].dt.floor('d') .But isn't that weird - if you cant change or create a column ? – singularity2047 May 22 '18 at 19:01
  • Hmmm, from your last comment; you know that `df2['date'].values.astype('datetime64[D]')` doesn't work in-place at all? You have to assign the results back to something, or the change is thrown away. That's part of the reason I was struggling to read your question flow – roganjosh May 22 '18 at 19:03
  • @singularity2047 although you point out an interesting fact that datetime object are always `datetime64[ns]` in pandas by opposition to numpy where you can have `datetime64[D]`, I'm not sure I understand the problem you have. And actually it makes sense that `df2['date'].values.astype('datetime64[D]')` give you `dtype='datetime64[D]'` because this is a numpy.ndarray and not a column of a DF anymore :) – Ben.T May 22 '18 at 19:23
  • " because this is a numpy.ndarray and not a column of a DF anymore " - I realized that after you pointed out. I wanted to have the dates in datetime64[D] type. But as you pointed out thats not possible as long as I am using a data frame. – singularity2047 May 22 '18 at 19:32

1 Answers1

0

Jeff Reback, a pandas developer, wrote the following in 2014 (and I think it still stands)

We don't allow direct conversions because its simply too complicated to keep anything other than datetime64[ns] internally (nor necessary at all). It could be done, but not very useful IMHO.

This is currently NotImplemented, but is very straightforward to do.

Therefore one may not be able to have dataframe columns as datetime64[D], only as datetime64[ns].

If one needs the column return a Numpy representation of the column using .values and work with it, as following

dates = df['date'].values.astype('datetime64[D]') 
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83