With sample data and code below, I'm trying to groupby year-month and find top K columns with smallest std values inside all the columns endswith _values
:
import pandas as pd
import numpy as np
from statistics import stdev
np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])
k = 3 # set k as 3
value_cols = df.columns[df.columns.str.endswith('_values')]
def find_topK_smallest_std(group):
std = stdev(group[value_cols])
cols = std.nsmallest(k).index
out_cols = [f'std_{i+1}' for i in range(k)]
rv = group.loc[:, cols]
rv.columns = out_cols
return rv
df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)
But it raises a type error, how could I fix this issue? Sincere thanks at advance.
Out:
TypeError: can't convert type 'str' to numerator/denominator
Reference link:
Groupby year-month and find top N smallest values columns in Python