Given three dataframes, one contains user data, the second one contains data binning and the third are category names as in:
klasses_df = pd.DataFrame([[1, 'Sad'],
[7, 'Regular'],
[13, 'Happy'],
[42, 'Magical']],
columns=['klass', 'mood'])
bins_df = pd.DataFrame([[0.0, 3.0, 1],
[3.0, 6.0, 7],
[6.0, 8.0, 13]],
columns=['lower', 'upper', 'klass'])
person_df = pd.DataFrame([['John', 1.5],
['Mary', 3.6],
['Paul', 7.2],
['Josh', 5.7],
['Phil', 9.9]],
columns=['name', 'feeling'])
I would like to extend the person_df
(or create a new dataframe) where the correct klass_id
and mood
can be found. For example in the first row of person_df
, John's feelings are at 1.5
, checking in bins_df
we can see that is in range first range [0, 3]
hence at klass
1
. Looking klasses_df
we find that klass_id
1
is Sad
. This will make the final/new row related to Jonh as John, 1.5, 1, 'Sad'
.
To achieve that I have created two auxiliary funcions:
def find_klass_from_feeling(feeling, bin_data):
values = bin_data.values
klass = values[(values[:,0] <= feeling) & (feeling < values[:,1])][:,2]
if len(klass) == 0:
return 0
else:
return int(klass.flatten()[0])
def find_mood_from_class(klass, klasses_data):
if klass == 0:
return None
retval = klasses_df[klasses_df['klass'] == klass]['mood'].iloc[0]
return retval
And I call them as:
final_df = person_df.copy()
klss = []
moods = []
for idx, row in person_df.iterrows():
kls = find_klass_from_feeling(row['feeling'], bins_df)
mood = find_mood_from_class(kls, klasses_df)
klss.append(kls)
moods.append(mood)
final_df['klass'] = klss
final_df['mood'] = moods
It works but seems completely wrong, since I believe, pandas has some more proper way to handle it. I tried to use apply
and applymap
without success.
Any hints are welcome.