0

I have a Polars LazyFrame that, after applying several functions, looks like this:

┌───────────────┬──────────────┬─────────────────────────┬──────────────────────────┐
│ citing_patent ┆ cited_patent ┆ cited_patent_issue_date ┆ citing_patent_issue_date │
│ ---           ┆ ---          ┆ ---                     ┆ ---                      │
│ str           ┆ str          ┆ date                    ┆ date                     │
╞═══════════════╪══════════════╪═════════════════════════╪══════════════════════════╡
│ X             ┆ A            ┆ 2000-10-20              ┆ 2001-02-08               │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ X             ┆ B            ┆ 1999-08-04              ┆ 2001-02-08               │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Y             ┆ B            ┆ 1999-08-04              ┆ 2004-06-04               │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ N             ┆ A            ┆ 2000-10-20              ┆ 2021-12-20               │
└───────────────┴──────────────┴─────────────────────────┴──────────────────────────┘

I would like to group it by cited_patent, and have a column for the number of citing_patents within three years of cited_patent_issue_date.

After reading 15741618, I tried using dateutil.relativedelta.

Here is the code I have so far:

.groupby("cited_patent")
.agg(
    [
        pl.col("cited_patent_issue_date").first(),
        (pl.col("citing_patent_issue_date") <= pl.col("cited_patent_issue_date").first() + relativedelta(years=3)).sum()
    ]
)

However, this doesn't work, as I get an error:

pyo3_runtime.PanicException: could not convert value relativedelta(years=+3) as a Literal

I can't seem to find anything else on this, so I'm a bit stuck.

What's the recommended way to add years to dates in Polars?

1 Answers1

2

You can use the offset_by function. Starting with this data:

import polars as pl

df = pl.DataFrame(
    {
        "citing_patent": ["X", "X", "Y", "N"],
        "cited_patent": ["A", "B", "B", "A"],
        "cited_patent_issue_date": [
            "2000-10-20",
            "1999-08-04",
            "1999-08-04",
            "2000-10-20",
        ],
        "citing_patent_issue_date": [
            "2001-02-08",
            "2001-02-08",
            "2004-06-04",
            "2021-12-20",
        ],
    }
).with_columns(pl.col("^.*_date$").str.strptime(pl.Date))
df
shape: (4, 4)
┌───────────────┬──────────────┬─────────────────────────┬──────────────────────────┐
│ citing_patent ┆ cited_patent ┆ cited_patent_issue_date ┆ citing_patent_issue_date │
│ ---           ┆ ---          ┆ ---                     ┆ ---                      │
│ str           ┆ str          ┆ date                    ┆ date                     │
╞═══════════════╪══════════════╪═════════════════════════╪══════════════════════════╡
│ X             ┆ A            ┆ 2000-10-20              ┆ 2001-02-08               │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ X             ┆ B            ┆ 1999-08-04              ┆ 2001-02-08               │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Y             ┆ B            ┆ 1999-08-04              ┆ 2004-06-04               │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ N             ┆ A            ┆ 2000-10-20              ┆ 2021-12-20               │
└───────────────┴──────────────┴─────────────────────────┴──────────────────────────┘

We can use the offset_by with 3y. Note that it's under the dt namespace of datetime expressions in Polars.

(
    df.groupby("cited_patent").agg(
        [
            pl.col("cited_patent_issue_date").first(),
            (
                pl.col("citing_patent_issue_date")
                <= pl.col("cited_patent_issue_date").first().dt.offset_by('3y')
            ).sum(),
        ]
    )
)
shape: (2, 3)
┌──────────────┬─────────────────────────┬──────────────────────────┐
│ cited_patent ┆ cited_patent_issue_date ┆ citing_patent_issue_date │
│ ---          ┆ ---                     ┆ ---                      │
│ str          ┆ date                    ┆ u32                      │
╞══════════════╪═════════════════════════╪══════════════════════════╡
│ B            ┆ 1999-08-04              ┆ 1                        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ A            ┆ 2000-10-20              ┆ 1                        │
└──────────────┴─────────────────────────┴──────────────────────────┘