2

How can we convert a dask_cudf column of string or nanoseconds to a datetime object? to_datetime is available in pandas and cudf. See sample data below

import pandas
import cudf

# with pandas
df = pandas.DataFrame( {'city' : ['Dallas','Bogota','Chicago','Juarez'], 
                      'timestamp' : [1664828099973725440,1664828099972763136,1664828094775313920,1664828081313273856]})

df['datetime'] = pd.to_datetime(df['timestamp'])

# with cdf
cdf = cudf.DataFrame( {'city' : ['Dallas','Bogota','Chicago','Juarez'], 
                      'timestamp' : [1664828099973725440,1664828099972763136,1664828094775313920,1664828081313273856]})
cdf['datetime'] = cudf.to_datetime(cdf['timestamp'])

print(df)
print(cdf) 

in either case, the result is the same:

      city            timestamp                      datetime
0   Dallas  1664828099973725440 2022-10-03 20:14:59.973725440
1   Bogota  1664828099972763136 2022-10-03 20:14:59.972763136
2  Chicago  1664828094775313920 2022-10-03 20:14:54.775313920
3   Juarez  1664828081313273856 2022-10-03 20:14:41.313273856

This recent SO question suggests using dask:

import dask_cudf
from dask import dataframe as dd

ddf = dask_cudf.from_cudf(cdf, npartitions=2)

dd.to_datetime(ddf['timestamp']).head()

produces an error. I am creating a dask_cudf from a large number of csv files in one directory.

dleal
  • 2,244
  • 6
  • 27
  • 49
  • 1
    This will be resolved by https://github.com/dask/dask/pull/9881 . For now, you can use the cuDF `to_datetime` within a `dask.dataframe.map_partitions` call. – Nick Becker Apr 24 '23 at 13:42

0 Answers0