As of polars>=0.10.4
you can use the pl.spearman_rank_corr
function.
If you want to use a custom function you could do it like this:
Custom function on multiple columns/expressions
import polars as pl
from typing import List
from scipy import stats
df = pl.DataFrame({
"g": [1, 1, 1, 2, 2, 2, 5],
"a": [2, 4, 5, 190, 1, 4, 1],
"b": [1, 3, 2, 1, 43, 3, 1]
})
def get_score(args: List[pl.Series]) -> pl.Series:
return pl.Series([stats.spearmanr(args[0], args[1]).correlation], dtype=pl.Float64)
(df.groupby("g", maintain_order=True)
.agg(
pl.apply(
exprs=["a", "b"],
function=get_score).alias("corr")
))
Polars provided function
(df.groupby("g", maintain_order=True)
.agg(
pl.spearman_rank_corr("a", "b").alias("corr")
))
Both output:
shape: (3, 2)
┌─────┬──────┐
│ g ┆ corr │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪══════╡
│ 1 ┆ 0.5 │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ -1e0 │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 5 ┆ NaN │
└─────┴──────┘
Custom function on a a single column/expression
We can also apply custom functions on single expressions, via .apply
or .map
.
Below is an example of how we can square a column with a custom function and with normal polars expressions. The expression syntax
should always be preferred, as its a lot faster.
(df.groupby("g")
.agg(
pl.col("a").apply(lambda group: group**2).alias("squared1"),
(pl.col("a")**2).alias("squared2")
))
what's the difference between apply
and map
?
map
works on whole column series
. apply
works on single values, or single groups, dependent on the context.
select
context:
map
- input/output type:
Series
- semantic meaning of input: a column value
apply
- input/output type:
Union[int, float, str, bool]
- semantic meaning of input: single values in a column
groupby
context:
map
- input/output type:
Series
- semantic meaning of input: A list column where the values are the groups
apply
- input/output type:
Series
- semantic meaning of input: The groups