Environment
macos: monterey
node: v18.1.0
nodejs-polars: 0.5.3
Goal
Subtract every column in a polars DataFrame with the mean of that column.
Pandas solution
In pandas the solution is very concise thanks to DataFrame.sub(other, axis='columns', level=None, fill_value=None)
. other
is scalar, sequence, Series, or DataFrame
:
df.sub(df.mean())
df - df.mean()
nodejs-polars solution
While in nodejs-polars function, other
only seems to be a Series
according to sub: (other) => wrap("sub", prepareOtherArg(other).inner())
.
1. Prepare data
console.log(df)
┌─────────┬─────────┬─────────┬─────────┐
│ A ┆ B ┆ C ┆ D │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════════╪═════════╪═════════╪═════════╡
│ 13520 ┆ -16 ┆ 384 ┆ 208 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 13472 ┆ -16 ┆ 384 ┆ 176 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 13456 ┆ -16 ┆ 368 ┆ 160 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 13472 ┆ -16 ┆ 368 ┆ 160 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 13472 ┆ -16 ┆ 352 ┆ 176 │
└─────────┴─────────┴─────────┴─────────┘
console.log(df.mean())
┌─────────┬─────────┬─────────┬─────────┐
│ A ┆ B ┆ C ┆ D │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪═════════╪═════════╪═════════╡
│ 13478.4 ┆ -16.0 ┆ 371.2 ┆ 176.0 │
└─────────┴─────────┴─────────┴─────────┘
2. first try
df.sub(df.mean())
Error: Failed to determine supertype of Int64 and Struct([Field { name: "A", dtype: Int32 }, Field { name: "B", dtype: Int32 }, Field { name: "C", dtype: Int32 }, Field { name: "D", dtype: Int32 }])
3. second try
df.sub(pl.Series(df.mean().row(0)))
Program crashes due to memory problems.
4. third try
After some investigations, I noticed the tests:
test("sub", () => {
const actual = pl.DataFrame({
"foo": [1, 2, 3],
"bar": [4, 5, 6]
}).sub(1);
const expected = pl.DataFrame({
"foo": [0, 1, 2],
"bar": [3, 4, 5]
});
expect(actual).toFrameEqual(expected);
});
test("sub:series", () => {
const actual = pl.DataFrame({
"foo": [1, 2, 3],
"bar": [4, 5, 6]
}).sub(pl.Series([1, 2, 3]));
const expected = pl.DataFrame({
"foo": [0, 0, 0],
"bar": [3, 3, 3]
});
expect(actual).toFrameEqual(expected);
});
nodejs-polars seems to be unable to complete this task gracefully right now. So my current solution is a bit cumbersome: perform operations column by column then concat the results.
pl.concat(df.columns.map((col) => df.select(col).sub(df.select(col).mean(0).toSeries())), {how:'horizontal'})
Is there a better or easier way to do it?
5. new try
I just came out an easier solution, but it's hard to understand, and I'm still trying to figure out what happened under the hood.
df.select(pl.col('*').sub(pl.col('*').mean()))