Datetimes for Julia dataframes

Question

pandas has a number of very handy utilities for manipulating datetime indices. Is there any similar functionality in Julia? I have not found any tutorials for working with such things, though it obviously must be possible.

Some examples of pandas utilities:

dti = pd.to_datetime(
    ["1/1/2018", np.datetime64("2018-01-01"), 
datetime.datetime(2018, 1, 1)]
)

dti = pd.date_range("2018-01-01", periods=3, freq="H")

dti = dti.tz_localize("UTC")

dti.tz_convert("US/Pacific")

idx = pd.date_range("2018-01-01", periods=5, freq="H")
ts = pd.Series(range(len(idx)), index=idx)
ts.resample("2H").mean()

i'd like to believe that you can get this information from the docs of whatever library in Julia that you are working with — sammywemmy, Dec 27 '21 at 01:47
@sammywemmy I would like to believe this also, but reality does not agree. — Igor Rivin, Dec 27 '21 at 02:05
Julia has a standard library Dates, which is quite nice. And DataFrames accepts any element types. You might be looking for documentation using them together, and finding none, because it's assumed that things like this work well together without special-casing. If you have particular hurdles, I'm sure people can help with more focused questions. — mcabbott, Dec 27 '21 at 03:40
Per @mcabbot's comment data/time handling is shipped with Julia Base, you can find the documentation here https://docs.julialang.org/en/v1/stdlib/Dates/. If you have a specific feature that is missing can you please update the question pointing to a specific problem you have then it will be possible to re-open and answer it. — Bogumił Kamiński, Dec 27 '21 at 08:38
Guys, this question is now a nice tutorial closed few minutes ago by non Julia community people. I vote to reopen it :) — Przemyslaw Szufel, Jan 04 '22 at 17:05
@PrzemyslawSzufel Closed TWICE at this point. Probably not a record, but we are getting there. — Igor Rivin, Jan 05 '22 at 01:31
@IgorRivin maybe edit the question and copy-paste that few Python examples from the panadas link to the question so it will not getting closed — Przemyslaw Szufel, Jan 05 '22 at 10:16

score 13 · Accepted Answer · edited Jan 07 '22 at 04:07

Julia libraries have "do only one thing but do it right" philosophy so the layout of its libraries matches perhaps more a Unix (battery of small tools that allow to accomplish a common goal) rather then Python's. Hence you have separate libraries for DataFrames and Dates:

julia> using Dates, DataFrames

Going through some of the examples of your tutorial:

Pandas

dti = pd.to_datetime(
    ["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
)

Julia

julia> DataFrame(dti=[Date("1/1/2018", "m/d/y"), Date("2018-01-01"), Date(2018,1,1)])
3×1 DataFrame
 Row │ dti
     │ Date
─────┼────────────
   1 │ 2018-01-01
   2 │ 2018-01-01
   3 │ 2018-01-01

Pandas

dti = pd.date_range("2018-01-01", periods=3, freq="H")

Julia

julia> DateTime("2018-01-01")  .+ Hour.(0:2)
3-element Vector{DateTime}:
 2018-01-01T00:00:00
 2018-01-01T01:00:00
 2018-01-01T02:00:00

Pandas

dti = dti.tz_localize("UTC")

dti.tz_convert("US/Pacific")

Julia

Note that that there is a separate library in Julia for time zones. Additionally "US/Pacific" is a legacy name of a time zone.

julia> using TimeZones

julia> dti = ZonedDateTime.(dti, tz"UTC")
3-element Vector{ZonedDateTime}:
 2018-01-01T00:00:00+00:00
 2018-01-01T01:00:00+00:00
 2018-01-01T02:00:00+00:00

julia> julia> astimezone.(dti, TimeZone("US/Pacific", TimeZones.Class(:LEGACY)))
3-element Vector{ZonedDateTime}:
 2017-12-31T16:00:00-08:00
 2017-12-31T17:00:00-08:00
 2017-12-31T18:00:00-08:00

Pandas

idx = pd.date_range("2018-01-01", periods=5, freq="H")
ts = pd.Series(range(len(idx)), index=idx)
ts.resample("2H").mean()

Julia

For resampling or other complex manipulations you will want to use the split-apply-combine pattern (see https://docs.juliahub.com/DataFrames/AR9oZ/1.3.1/man/split_apply_combine/)

julia> df = DataFrame(date=DateTime("2018-01-01")  .+ Hour.(0:4), vals=1:5)
5×2 DataFrame
 Row │ date                 vals
     │ DateTime             Int64
─────┼────────────────────────────
   1 │ 2018-01-01T00:00:00      1
   2 │ 2018-01-01T01:00:00      2
   3 │ 2018-01-01T02:00:00      3
   4 │ 2018-01-01T03:00:00      4
   5 │ 2018-01-01T04:00:00      5
julia> df.date2 = floor.(df.date, Hour(2));

julia> using StatsBase

julia> combine(groupby(df, :date2), :date2, :vals => mean => :vals_mean)
5×2 DataFrame
 Row │ date2                vals_mean
     │ DateTime             Float64
─────┼────────────────────────────────
   1 │ 2018-01-01T00:00:00        1.5
   2 │ 2018-01-01T00:00:00        1.5
   3 │ 2018-01-01T02:00:00        3.5
   4 │ 2018-01-01T02:00:00        3.5
   5 │ 2018-01-01T04:00:00        5.0

Thanks, that is very helpful, but the sort of thing one would like to do is resampling or, for example, pick time stamps which are between 9am and noon (or whatever). Also, your range example is intentionally easy - what if my date range is of business days - then something like you describe is a little more cumbersome. — Igor Rivin, Dec 27 '21 at 21:21
I added resampling. For business days this will depend on your calendar - but basically the pattern will be similar. Perhaps this makes a good separate question. — Przemyslaw Szufel, Dec 27 '21 at 22:03

Datetimes for Julia dataframes

1 Answers1

Pandas

Julia

Pandas

Julia

Pandas

Julia

Pandas

Julia