0

All, I am trying to read the time coordinate from Berkley Earth in the following temperature file. The time spans from 1850 to 2022. The time unit is in the year A.D. (1850.041667, 1850.125, 1850.208333, ..., 2022.708333, 2022.791667,2022.875).

The pandas.to_datetime cannot correctly interpret the time array because I think I need to state the origin of the time coordinate and the unit. I tried to use pd.to_datetime(dti,unit='D',origin='julian’), but it did not work (out of bounds). Also, I think I have to use a unit of years instead of Days.

The file is located here http://berkeleyearth.lbl.gov/auto/Global/Gridded/Land_and_Ocean_LatLong1.nc

import xarray as xr
import numpy as np
import pandas as pd  
# read data into memory
flname="Land_and_Ocean_LatLon1.nc"
ds = xr.open_dataset("./"+flname)
dti = ds['time']
pd.to_datetime(dti,unit='D',origin='julian')
np.diff(dti)
Kernel
  • 591
  • 12
  • 23
  • general question: [Convert fractional years to a real date in Python](https://stackoverflow.com/q/19305991/10197418) – FObersteiner Jan 01 '23 at 11:39

1 Answers1

1

Convert to datetime using %Y as parsing directive to get the year only, then add the fractional year as a timedelta of days. Note that you have might have to account for leap years when calculating the timedelta. Ex:

import pandas as pd

dti = pd.to_datetime(ds['time'], format="%Y")

# it might be sufficient to use e.g. 365 or 365.25 here, depending on the input
daysinyear = pd.Series([366]*dti.size).where(dti.is_leap_year, 365)

dti = dti + pd.to_timedelta(daysinyear * (ds['time']-ds['time'].astype(int)), unit="d")

dti
0      1850-01-16 04:59:59.999971200
1      1850-02-15 15:00:00.000000000
2      1850-03-18 01:00:00.000028800
3      1850-04-17 10:59:59.999971200
4      1850-05-17 21:00:00.000000000
            
2070   2022-07-17 16:59:59.999971200
2071   2022-08-17 03:00:00.000000000
2072   2022-09-16 13:00:00.000028800
2073   2022-10-16 22:59:59.999971200
2074   2022-11-16 09:00:00.000000000
Length: 2075, dtype: datetime64[ns]
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • `np.where(dti.is_leap_year == True)` yields strange output, they are not oscillating every 4 years. Here is the head of the output (array([ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, ... – Kernel Jan 01 '23 at 12:38
  • 1
    @Kernel btw. you can check the leap years with `dti.dt.year[dti.dt.is_leap_year].unique()`. And it's not strictly every 4 years; only if `year%400 == 0 || (year%100 != 0 && year%4 == 0)` – FObersteiner Jan 01 '23 at 12:41
  • `dpi.dt` AttributeError: 'DatetimeIndex' object has no attribute 'dt' – Kernel Jan 01 '23 at 12:46
  • 1
    @Kernel after you added the timedelta, you should have a normal series, which has the dt accessor. For a DatetimeIndex, remove the `dt`. – FObersteiner Jan 01 '23 at 12:48
  • 1
    @Kernel I just realized the leap year determination was still buggy. see the update. ***However***, sometimes in those netcdf time coordinates, leap years are not considered when the coordinate is generated (e.g. for a model output). So in your specific case, you might need to ignore leap years and simply calculate with 365 days / year (or 365.25 or whatever is adequate). – FObersteiner Jan 01 '23 at 13:01