I want to convert time columns (dtype: datetime64[ns]
) in a panda.DataFrame
into strings representing the year and month only.
It works as expected if all values in the column are valid.
0 2019-4
1 2017-12
dtype: object
But with missing values (pandas.NaT
) in the column the result confuses me.
0 -1 days +23:59:59.999979806
1 -1 days +23:59:59.999798288
2 NaT
dtype: timedelta64[ns]
Or with .unique()
it is array([ -20194, -201712, 'NaT'], dtype='timedelta64[ns]')
.
What happens here seems that somehow the result becomes a timedelta64
. But I don't understand why this happens. The question is why does this happen?
The complete example code:
#!/usr/bin/env pyhton3
import pandas as pd
import numpy as np
# series with missing values
series = pd.Series([
np.datetime64('2019-04-08'),
np.datetime64('2017-12-05')])
def year_month_string(cell):
"""Convert a datetime64 into string representation with
year and month only.
"""
if pd.isna(cell):
return pd.NaT
return '{}-{}'.format(cell.year, cell.month)
print(series.apply(year_month_string))
# 0 2019-4
# 1 2017-12
# dtype: object
# Series with a missing value
series_nat = pd.Series([
np.datetime64('2019-04-08'),
np.datetime64('2017-12-05'),
pd.NaT])
result = series_nat.apply(year_month_string)
print(result)
# 0 -1 days +23:59:59.999979806
# 1 -1 days +23:59:59.999798288
# 2 NaT
# dtype: timedelta64[ns]
print(result.unique())
# array([ -20194, -201712, 'NaT'], dtype='timedelta64[ns]')