13

Consider a Polars data frame with a column of str type that indicates the date in the format '27 July 2020'. I would like to convert this column to the polars.datetime type, which is distinct from the Python standard datetime. The following code, using the standard datetime format, works but Polars does not recognise the values in the column as dates.

import polars as pl
from datetime import datetime

df = pd.read_csv('<some CSV file containing a column called 'event_date'>')
df = df.with_columns([   
        pl.col('event_date').apply(lambda x: x.replace(" ","-"))\
                            .apply(lambda x: datetime.strptime(x, '%d-%B-%Y'))
])

Suppose we try to process df further to create a new column indicating the quarter of the year an event took place.

df = df.with_columns([
        pl.col('event_date').apply(lambda x: x.month)\
                            .apply(lambda x: 1 if x in range(1,4) else 2 if x in range(4,7) else 3 if x in range(7,10) else 4)\
                            .alias('quarter')
])

The code returns the following error because it qualifies event_type as dtype Object("object") and not as datetime or polars.datetime

thread '<unnamed>' panicked at 'dtype Object("object") not supported', src/series.rs:992:24
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
PanicException: Unwrapped panic from Python code
fabioklr
  • 430
  • 1
  • 5
  • 13

1 Answers1

28

Update: .str.to_datetime() can be used as of polars v0.17.10

df = pl.from_repr("""
┌─────┬──────────────────┐
│ id  ┆ event_date       │
│ --- ┆ ---              │
│ i64 ┆ str              │
╞═════╪══════════════════╡
│ 1   ┆ 27 July 2020     │
│ 2   ┆ 31 December 2020 │
└─────┴──────────────────┘
""")

df.with_columns(
   pl.col("event_date").str.to_datetime("%d %B %Y")
)
shape: (2, 2)
┌─────┬─────────────────────┐
│ id  ┆ event_date          │
│ --- ┆ ---                 │
│ i64 ┆ datetime[μs]        │
╞═════╪═════════════════════╡
│ 1   ┆ 2020-07-27 00:00:00 │
│ 2   ┆ 2020-12-31 00:00:00 │
└─────┴─────────────────────┘

The easiest way to convert strings to Date/Datetime is to use Polars' own strptime function (rather than the same-named function from Python's datetime module).

For example, let's start with this data.

import polars as pl

df = pl.DataFrame({
    'date_str': ["27 July 2020", "31 December 2020"]
})
print(df)
shape: (2, 1)
┌──────────────────┐
│ date_str         │
│ ---              │
│ str              │
╞══════════════════╡
│ 27 July 2020     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 31 December 2020 │
└──────────────────┘

To convert, use Polars' strptime function.

df.with_columns(pl.col('date_str').str.strptime(pl.Date, fmt='%d %B %Y').cast(pl.Datetime))
shape: (2, 1)
┌─────────────────────┐
│ date_str            │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-07-27 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-12-31 00:00:00 │
└─────────────────────┘

Notice that we did not need to replace spaces with dashes. I've cast the result as a Datetime (per your question), but you may be able to use a Date instead.

Currently, the apply method does not work when the return type is a python Date/Datetime object, but there is a request for this. That said, it's better to use Polars' strptime. It will be much faster than calling python datetime code.

Edit: as of Polars 0.13.19, the apply method will automatically convert Python date/datetime to Polars Date/Datetime.

jqurious
  • 9,953
  • 1
  • 4
  • 14
  • for your df to reflect the changes assign the function to it: `df = df.with_colu....` – PJ_ Jun 03 '22 at 19:16
  • 3
    You’ll find the non-assignment to be common in Polars discussions, particularly when we just want to print a result, but not change the original DataFrame. So instead of assigning and then issuing a print statement which is two lines, we simply don’t assign the value, which acts as an implicit print statement. –  Jun 03 '22 at 19:29
  • `from_repr` is pure witchcraft, thank you for bringing that up! – Filippo Vitale Aug 04 '23 at 19:25