I'm searching for an easier solution to categorize my data. Based on two columns i'm trying to find the best combination of the two which will include most cases. For both of the two columns i've set an deviation of .25 above and below the value.
To illustrate my problem you could run the following script. The script does exactly what it should do, but it feels unclean and rubbish.
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'col2':[1,1,1,1.2,1.2,1.3,1.5,2,2,2.3,2.5,2.8,3,3,3],
'col3':[2,2.2,2.3,4.2,1.6,1.6,1.3,1.4,1.5,1.7,2.4,2.8,2.9,3,4]})
df_combinations = pd.DataFrame()
for i,r in df.iterrows():
range_min_c2,range_max_c2 = r['col2'] -.25,r['col2'] +.25
range_min_c3,range_max_c3 = r['col3'] -.25,r['col3'] +.25
dft = df.copy(deep=True)
dft['test'] = dft.apply(lambda x: 1 if range_min_c2 < x['col2'] < range_max_c2 and
range_min_c3 < x['col3'] < range_max_c3 else 0,axis=1)
sum_test = sum(dft['test'])
df_combinations = df_combinations.append(pd.DataFrame({'min_c2':[range_min_c2],'max_c2':[range_max_c2],
'min_c3':[range_min_c3],'max_c3':[range_max_c3],
'sum_test':[sum_test]}))
My question is pretty simple: Is there an easier (more pretty) way to get this output? Perhaps a predefined function? Anyway, thanks in advance!