28

I have a dataframe with a DataTime column (with Timezone in different formats). It appears like timezone is UTC but I want to convert the column to pd.to_datetime and that is failing. That is problem #1. Since that fails I cannot do any datetime operations on the time period such as group the column by date / figure out the days / group by hour of the day and so on. Here's my dataframe df_res

    DateTime
    2017-11-02 19:49:28-07:00
    2017-11-27 07:32:22-08:00
    2017-12-27 17:01:15-08:00

OUTPUT for the command

      df_res["DateTime"] = df_res["DateTime"].dt.tz_convert('America/New_York')

AttributeError: Can only use .dt accessor with datetimelike values

WHen I convert to datetime

   df_res['DateTime'] = pd.to_datetime(df_res['DateTime'])

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

I feel I am going around in circles. I need to convert the column to datetime in order to perform operations & in order to do that I need to have them all the same timezone but I cannot have the same timezone unless it is a datetime object so how can I best approach this. I did refer to previous postings but they seem to convert to datetime as easily as possible:

Convert datetime columns to a different timezone pandas Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

CezarySzulc
  • 1,849
  • 1
  • 14
  • 30
py_noob
  • 433
  • 2
  • 8
  • 17

3 Answers3

39

I think that it is not necessary to apply lambdas:

df_res['DateTime'] = pd.to_datetime(df_res['DateTime'], utc=True)

documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

JLM
  • 708
  • 1
  • 6
  • 8
6

You can check this:

df = pd.DataFrame({
    'time': [
        '2017-11-02 19:49:28-08:00', 
        '2017-11-27 07:32:22-07:00', 
        '2017-12-27 17:01:15-07:00'
    ]
})

df['time'] = pd.to_datetime(df['time'])

df['time'].apply(lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))
0   2017-11-03 03:49:28-04:00
1   2017-11-27 14:32:22-05:00
2   2017-12-28 00:01:15-05:00
Name: time, dtype: datetime64[ns, US/Eastern]
CezarySzulc
  • 1,849
  • 1
  • 14
  • 30
  • 1
    Thanks .. And what if my dataframe has over a 10k+ datetime entries. The DateTime is of type `object` too and I need to convert them all? – py_noob Mar 28 '19 at 01:36
  • 1
    @py_noob: Maybe too late for you, I've met the same problem that after converting to datetime format, the column is of type `object`. However, when I check each line, they are all in datetime format. Its strange but I think it's not a problem, isn't it? – Anh-Thi DINH Feb 13 '20 at 09:02
  • 2
    In my case I needed to pass a `utf=True` arg to `pd.to_datetime()`, as in `df['time'] = pd.to_datetime(df['time'], utf=True)` – dustintheglass Aug 25 '20 at 19:21
  • 1
    Copy paste your code and throw TypeError: Cannot localize tz-aware Timestamp, use tz_convert for conversions – SKSKSKSK Nov 03 '22 at 08:37
0

Posting here because I spent few hours to figure out the answer to the title of this question for the general case where you might have the naive datetimes on another timezone than UTC.

Here is the solution I come up with. I'd happy if someone with a deeper understanding of pandas/numpy can point out if there is any way to improve its performance. Though, I might come handy as a starting point for someone with a similar issue as it is.

from datetime import datetime

import pandas as pd
from pandas import Series
from pandas.api.types import is_datetime64_any_dtype as is_datetime


def ensure_datetime(series: Series, timezone: str):
    """
    Ensures that the `series` is a datetime series of dtype datetime64[ns, timezone]

    - Convert tz aware values to `timezone`.
    - Assume naive values are on `timezone` and make them aware.
    - Handle None values and convert them to NaT (so we can accomplish the dtype requirement).
    """
    if series.dtype == pd.DatetimeTZDtype(tz=timezone):
        return series

    are_datetime = series.apply(lambda x: isinstance(x, datetime)).astype(bool)

    # Convert only values that are not already datetime, otherwise if there are
    # tz-aware values pandas will raise: Tz-aware datetime.datetime cannot
    # be converted to datetime64 unless utc=True.
    # We cannot set utc=True because pandas will assume naive values to be on UTC
    # but we need naive values to be considered on `timezone`.
    series = series.mask(
        ~are_datetime, pd.to_datetime(series[~are_datetime], errors="coerce")
    )

    # Localize naive values to `timezone`
    are_unaware = series.apply(lambda x: not pd.isna(x) and x.tzinfo is None).astype(
        bool
    )
    series = series.mask(
        are_unaware, pd.to_datetime(series[are_unaware]).dt.tz_localize(timezone)
    )

    # Now that we don't have any naive value we can normalize all to UTC and
    # then convert to `timezone`.
    series = pd.to_datetime(series, utc=True).dt.tz_convert(timezone)

    return series

def test_ensure_datetime():
    series = pd.Series(
        ["2022-12-31 16:00:00-08:00", "2023-01-01", "2023-01-01 12:30", None]
    )

    series = ensure_datetime(series, "America/New_York")

    assert is_datetime(series)
    assert list(series) == [
        pd.Timestamp("2022-12-31 19:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 00:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 12:30", tz="America/New_York"),
        pd.NaT,
    ]

    series = ensure_datetime(series.dt.date, "America/New_York")

    assert is_datetime(series)
    assert list(series) == [
        pd.Timestamp("2022-12-31 00:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 00:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 00:00", tz="America/New_York"),
        pd.NaT,
    ]

    # Mix aware timestamps with naive
    series = pd.Series(
        [
            pd.Timestamp("2022-12-31 12:00", tz="America/New_York"),
            pd.Timestamp("2022-12-31 12:00"),
        ]
    )
    series = ensure_datetime(series, "America/New_York")
    assert list(series) == [
        pd.Timestamp("2022-12-31 12:00", tz="America/New_York"),
        pd.Timestamp("2022-12-31 12:00", tz="America/New_York"),
    ]
nicootto
  • 129
  • 8