I am trying to follow advice from this question
df = pl.DataFrame({'a':[1, 2, 3], 'b':[4,5,6]})
df.select([pl.all().map(np.log2)])
shape: (3, 2)
┌──────────┬──────────┐
│ a ┆ b │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞══════════╪══════════╡
│ 0.0 ┆ 2.0 │
│ 1.0 ┆ 2.321928 │
│ 1.584963 ┆ 2.584963 │
└──────────┴──────────┘
So far, so good. But:
from sklearn.preprocessing import minmax_scale
>>> df.select(pl.all().map(minmax_scale))
shape: (1, 2)
┌─────────────────┬─────────────────┐
│ a ┆ b │
│ --- ┆ --- │
│ list[f64] ┆ list[f64] │
╞═════════════════╪═════════════════╡
│ [0.0, 0.5, 1.0] ┆ [0.0, 0.5, 1.0] │
└─────────────────┴─────────────────┘
I found a way of converting the pl.List
back, but it seems strange that this step is needed.
df.select(pl.all().map(minmax_scale)).explode(pl.all())
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 0.0 ┆ 0.0 │
│ 0.5 ┆ 0.5 │
│ 1.0 ┆ 1.0 │
└─────┴─────┘
Both minmax_scale
and np.log2
return arrays, so I would expect the behavior to be the same. What is the proper way of doing this?