1

I want to convert time columns (dtype: datetime64[ns]) in a panda.DataFrame into strings representing the year and month only.

It works as expected if all values in the column are valid.

0     2019-4
1    2017-12
dtype: object

But with missing values (pandas.NaT) in the column the result confuses me.

0   -1 days +23:59:59.999979806
1   -1 days +23:59:59.999798288
2                           NaT
dtype: timedelta64[ns]

Or with .unique() it is array([ -20194, -201712, 'NaT'], dtype='timedelta64[ns]').

What happens here seems that somehow the result becomes a timedelta64. But I don't understand why this happens. The question is why does this happen?

The complete example code:

#!/usr/bin/env pyhton3
import pandas as pd
import numpy as np

# series with missing values
series = pd.Series([
    np.datetime64('2019-04-08'),
    np.datetime64('2017-12-05')])

def year_month_string(cell):
    """Convert a datetime64 into string representation with
    year and month only.
    """
    if pd.isna(cell):
        return pd.NaT

    return '{}-{}'.format(cell.year, cell.month)

print(series.apply(year_month_string))
# 0     2019-4
# 1    2017-12
# dtype: object

# Series with a missing value
series_nat = pd.Series([
    np.datetime64('2019-04-08'),
    np.datetime64('2017-12-05'),
    pd.NaT])

result = series_nat.apply(year_month_string)
print(result)
# 0   -1 days +23:59:59.999979806
# 1   -1 days +23:59:59.999798288
# 2                           NaT
# dtype: timedelta64[ns]
print(result.unique())
# array([ -20194, -201712,   'NaT'], dtype='timedelta64[ns]')
buhtz
  • 10,774
  • 18
  • 76
  • 149

1 Answers1

1

Don't use a custom function, use strftime with %-m (the minus strips the leading zeros):

series_nat.dt.strftime('%Y-%-m')

output:

0     2019-4
1    2017-12
2        NaN
dtype: object

%m would keep the leading zeros:

series_nat.dt.strftime('%Y-%m')

output:

0    2019-04
1    2017-12
2        NaN
dtype: object
mozway
  • 194,879
  • 13
  • 39
  • 75