1

I am trying to convert a pandas dataframe column of matlab datenum format to python date time format using the function below:

def datenum_to_datetime(datenum):
    """
    Convert Matlab datenum into Python datetime.
    :param datenum: Date in datenum format
    :return:        Datetime object corresponding to datenum.
    """
    days = datenum % 1
    date = datetime.fromordinal(int(datenum)) \
           + timedelta(days=days) \
           - timedelta(days=366)
 
    return date.year

A sample of the values

0         693726
1         693726
2         693726
3         693726
4         693726
           ...  
460718    726831
460719    726831
460720    726831
460721    726831
460722    726831
Name: dob, Length: 460723, dtype: int32

But I get the error below when I try to apply the function.

df['dob'] = df['dob'].apply(datenum_to_datetime)

OverflowError                             Traceback (most recent call last)
<ipython-input-67-cc4fb607b497> in <module>()
----> 1 df['dob'] = df['dob'].apply(datenum_to_datetime)
      2 df.head()

1 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-66-bab915eae5ff> in datenum_to_datetime(datenum)
      6     """
      7     days = datenum % 1
----> 8     date = datetime.fromordinal(int(datenum))            + timedelta(days=2)            - timedelta(days=366)
      9 
     10     return date.year

OverflowError: date value out of range
Chiebukar
  • 83
  • 9
  • solves the problem for me: [Converting Matlab's datenum format to Python](https://stackoverflow.com/q/13965740/10197418) – FObersteiner May 04 '21 at 06:42

1 Answers1

0

Didn't find a single solution following this Converting Matlab's datenum format to Python, but this function worked without errors

def datenum_to_datetime(datenum):
    """
    Convert Matlab datenum into Python datetime.
    :param datenum: Date in datenum format
    :return:        Datetime object corresponding to datenum.
    """
    origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
    date = datenum * np.timedelta64(1, 'D') + origin
    year = date.astype('datetime64[Y]').astype(int) + 1970
 
    return year
Chiebukar
  • 83
  • 9
  • I checked your code, it works with pandas 1.2.4 / numpy 1.20.2 but seems to fail with older versions. Besides, the method described [here](https://stackoverflow.com/a/49135037/10197418) seems to be much simpler to me (basically just `pd.to_datetime(df['datenum']-719529, unit='D')`). Once you have a column of type datetime, you can get the year like `df['datetime'].dt.year`. – FObersteiner May 04 '21 at 06:41
  • Your proposed method seems better, but I get this error when using it on pandas version 1.1.5. `OutOfBoundsDatetime: cannot convert input with unit 'D'` – Chiebukar May 04 '21 at 12:13
  • Same issue,it actually works for some datenum inputs and not all. – Chiebukar May 04 '21 at 12:37
  • perhaps the resulting datetime actually *is* [out of range](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timestamp-limits)? – FObersteiner May 04 '21 at 12:38
  • it works for `datenum = 656178 date = pd.to_datetime((datenum-719529)*86400, unit='s') year = date.year` results in 1796 but `datenum = 600000` gives the `OutOfBoundsDatetime: cannot convert input with unit 's'`. and I do have very low values like 146, 25 etc in my dataframe. I wonder this error. you could probably try converting 600000 and see its result. – Chiebukar May 04 '21 at 12:49
  • are those *valid* dates in your context? if not, you can set pd.to_datetime kwarg `errors='coerce'`. Otherwise, pandas datetime won't work; you'll have to use e.g. the Python standard lib datetime which can represent more years. – FObersteiner May 04 '21 at 13:12