9

I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame.

I checked the section from the Polars User Guide devoted to this topic, but I have not find the answer. Here I attach a code snippet with my unsuccessful attempts.

import numpy as np
import polars as pl
import seaborn as sns

# Loading toy dataset as Pandas DataFrame using Seaborn
df_pd = sns.load_dataset('iris')

# Converting Pandas DataFrame to Polars DataFrame
df_pl = pl.DataFrame(df_pd)

# Dropping the non-numeric column...
df_pd = df_pd.drop(columns='species')                     # ... using Pandas
df_pl = df_pl.drop('species')                             # ... using Polars

# Applying function to the whole DataFrame...
df_pd_new = df_pd.apply(np.log2)                          # ... using Pandas
# df_pl_new = df_pl.apply(np.log2)                        # ... using Polars?

# Applying lambda function to the whole DataFrame...
df_pd_new = df_pd.apply(lambda c: np.log2(c))             # ... using Pandas
# df_pl_new = df_pl.apply(lambda c: np.log2(c))           # ... using Polars?

Thanks in advance for your help and your time.

Gian Arauz
  • 423
  • 1
  • 7
  • 14

1 Answers1

17

You can use the expression syntax to select all columns with pl.col("*")/pl.all() and then map the numpy np.log2(..) function over the columns.

df.select([
    pl.all().map(np.log2)
])

Polars expressions also support numpy universal functions https://numpy.org/doc/stable/reference/ufuncs.html

That means you can pass a polars expression to a numpy ufunc:

df.select([
    np.log2(pl.all())
])

Note that the difference between an apply and a map is that an apply would be called upon every numeric values, and the map over the whole Series. We choose map here, because that would be faster.

ritchie46
  • 10,405
  • 1
  • 24
  • 43