1

I have a pandas dataframe as follows:

df = pd.DataFrame(data = [[1,0.56],[1,0.59],[1,0.62],[1,0.83],[2,0.85],[2,0.01],[2,0.79],[3,0.37],[3,0.99],[3,0.48],[3,0.55],[3,0.06]],columns=['polyid','value'])

    polyid  value
0        1   0.56
1        1   0.59
2        1   0.62
3        1   0.83
4        2   0.85
5        2   0.01
6        2   0.79
7        3   0.37
8        3   0.99
9        3   0.48
10       3   0.55
11       3   0.06

I need to reclassify the 'value' column separately for each 'polyid'. For the reclassification, I have two dictionaries. One with the bins that contain the information on how I want to cut the 'values' for each 'polyid' separately:

bins_dic = {1:[0,0.6,0.8,1], 2:[0,0.2,0.9,1], 3:[0,0.5,0.6,1]}

And one with the ids with which I want to label the resulting bins:

ids_dic = {1:[1,2,3], 2:[1,2,3], 3:[1,2,3]}

I tried to get this answer to work for my use case. I could only come up with applying pd.cut on each 'polyid' subset and then pd.concat all subsets again back to one dataframe:

import pandas as pd

def reclass_df_dic(df, bins_dic, names_dic, bin_key_col, val_col, name_col):
    df_lst = []
    for key in df[bin_key_col].unique():
        bins = bins_dic[key]
        names = names_dic[key]
        sub_df = df[df[bin_key_col] == key]
        sub_df[name_col] = pd.cut(df[val_col], bins, labels=names)
        df_lst.append(sub_df)
    return(pd.concat(df_lst))

df = pd.DataFrame(data = [[1,0.56],[1,0.59],[1,0.62],[1,0.83],[2,0.85],[2,0.01],[2,0.79],[3,0.37],[3,0.99],[3,0.48],[3,0.55],[3,0.06]],columns=['polyid','value'])
bins_dic = {1:[0,0.6,0.8,1], 2:[0,0.2,0.9,1], 3:[0,0.5,0.6,1]}
ids_dic = {1:[1,2,3], 2:[1,2,3], 3:[1,2,3]}

df = reclass_df_dic(df, bins_dic, ids_dic, 'polyid', 'value', 'id')

This results in my desired output:

    polyid  value  id
0        1   0.56   1
1        1   0.59   1
2        1   0.62   2
3        1   0.83   3
4        2   0.85   2
5        2   0.01   1
6        2   0.79   2
7        3   0.37   1
8        3   0.99   3
9        3   0.48   1
10       3   0.55   2
11       3   0.06   1

However, the line:

sub_df[name_col] = pd.cut(df[val_col], bins, labels=names)

raises the warning:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

that I am unable to solve with using .loc. Also, I guess there generally is a more efficient way of doing this without having to loop over each category?

Shaido
  • 27,497
  • 23
  • 70
  • 73
openwater
  • 23
  • 6

1 Answers1

1

A simpler solution would be to use groupby and apply a custom function on each group. In this case, we can define a function reclass that obtains the correct bins and ids and then uses pd.cut:

def reclass(group, name):
    bins = bins_dic[name]
    ids = ids_dic[name]
    return pd.cut(group, bins, labels=ids)
    
df['id'] = df.groupby('polyid')['value'].apply(lambda x: reclass(x, x.name))

Result:

    polyid  value  id
0        1   0.56   1
1        1   0.59   1
2        1   0.62   2
3        1   0.83   3
4        2   0.85   2
5        2   0.01   1
6        2   0.79   2
7        3   0.37   1
8        3   0.99   3
9        3   0.48   1
10       3   0.55   2
11       3   0.06   1
Shaido
  • 27,497
  • 23
  • 70
  • 73