0

I have the following datetime values in dask dataframe saved as string dates:

ddf = dd.DataFrame({'date': ['15JAN1955:13:15:27.369', NaN,'25DEC1990:23:18:17.200', '06MAY1962:02:55:27.360', NaN, '20SEP1975:12:02:26.357']}

I used ddf['date'].apply(lambda x: datetime.strptime(x,"%d%b%Y:%H:%M:%S.%f"), meta=datetime) but I get a TypeError: strptime() argument 1 must be a str, not float error.

I am following the way dates were parsed from the book: Data Science with python and dask.

Is the .%f expecting a float? Or maybe it has something to do with NaN values?

doubleD
  • 269
  • 1
  • 12
  • 1
    `NaN` is, indeed, a `float` value, and would appear to be the source of the error. `%f` has nothing in particular to do with it. – chepner Feb 24 '22 at 21:20

1 Answers1

2

You may use %f that parses any decimal fraction of seconds with up to 6 digits

Also 20SEPT1975 should be 20SEP1975 (no T in month)

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': ['15JAN1955:13:15:27.369', np.nan,
                            '25DEC1990:23:18:17.200', np.nan,
                            '06MAY1962:02:55:27.360', '20SEP1975:12:02:26.357']})

df['date'] = pd.to_datetime(df['date'], format="%d%b%Y:%H:%M:%S.%f")
print(df)
                     date
0 1955-01-15 13:15:27.369
1                     NaT
2 1990-12-25 23:18:17.200
3                     NaT
4 1962-05-06 02:55:27.360
5 1975-09-20 12:02:26.357
azro
  • 53,056
  • 7
  • 34
  • 70
  • Thanks, I thought your suggestion should work but I forgot to mention in the original post that the dates were string datatype. When I used the %f, I got a Typerror:. I think the .%f is expecting a float. – doubleD Feb 24 '22 at 20:35
  • You haven't show how you are using `strptime`. `%f` doesn't expect a float any more than any of the other numerical placeholders do. – chepner Feb 24 '22 at 21:19
  • Sorry, original post updated with the whole line of strptime used – doubleD Feb 24 '22 at 21:32
  • @DennisDavid my solution with `pd.to_datetime` handles `NaN` and changes it to `NaT` (not a time), so that seems ok ? – azro Feb 24 '22 at 21:50
  • The @azro, Initially when I tried it, it says the length of the ddf does not match the index, but maybe I was too tired already, I thought or debugging and optimizing my lines of codes and hopefully I get this to work. – doubleD Feb 25 '22 at 19:00
  • @azro, update, It actually converted the dates using your method, but the issue now is assigning the column ack to the source dask df. That is when I get a valueerror saying the length of values does not match the length of index. Thank you – doubleD Feb 26 '22 at 08:50