I have the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3],[1,2,1],[1,2,2],[2,2,2],[2,3,2],[2,4,2]],columns=["a","b","c"])
df = df.set_index("a")
df.groupby("a").mean()
df.groupby("a").std()
I want to standardize the dataframe for each key and NOT standardize the whole column vector.
So for the following example the output would be:
a = 1:
Column: b
(2 - 2) / 0.0
(2 - 2) / 0.0
(2 - 2) / 0.0
Column: c
(3 - 2) / 1.0
(1 - 2) / 1.0
(2 - 2) / 1.0
And then I would get each value standardized per group
How can I do that in spark?
Thanks