The following code is converting any kind of timestamp of dataframe into a given Format.
pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')
How can I do this with "DASK"? I used the below code but it did not work.
(df is dask dataframe)
a=dd.to_datetime(df["time:timestamp"],format='%Y-%m-%d %X')
a.compute()
Error-: ValueError: unconverted data remains: .304000+00:00
this is how timestamp look like-: "2016-01-01 09:51:15.304000+00:00"
(This could be any kind of format)
Expected output -: "2016-01-01 09:51:15"
I found Converting a Dask column into new Dask column of type datetime, but it is not working
Example with Pandas which works with any format-:
import pandas as pd
data = ['2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00']
data1 = ['2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15']
data2 = ['2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-01']
df1 = pd.DataFrame(data2, columns=['t'])
df1['t']=pd.to_datetime(df1["t"]).dt.strftime('%Y-%m-%d %X')
Can someone tell me, how to do the same with "Dask"
Here is my solution
it could be done with following code-:
dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X')
but now the problem is that i can't store this conversion in the existing dataframe like i did with pandas.
if i do df["t"]=dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X')
, it throws an error.
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.
this ValueError: Not all divisions are known, can't align partitions error on dask dataframe does not workk