I have done an addition by putting 'name'
as a key to identify the person.
Approach
The approach is that I have scored the values which is further used to filter the final pairs according to the given conditions.
Scoring for Kitchen
For kitchen scores we used:
- Person has no kitchen : 0
- Person has a kitchen : 1
- Person has kitchen but only in emergency : 0.5
if Condition Logic for kitchen
We check that if [kitchen score of record 1] + [kitchen score of record 2] is greater than Zero. As the following cases will be there:
- Both Members have no kitchen (sum will be 0) [EXCLUDED with > 0 Condition]
- Both Members have kitchen (sum will be 2)
- One Member have kitchen and other have no kitchen (sum will be 1)
- Both have emergency kitchen (sum will be 1)
- One have emergency kitchen and other have kitchen (sum will be 1.5)
- One Member have emergency kitchen and other have no kitchen (sum will be 0.5)
Scoring for Food
For food scores we used:
- food = 0 --> meat eater : -1
- food = 1 --> does not matter : 0
- food = 2 --> vegan : 1
- food = 3 --> vegetarian : 1
if Condition Logic for Food
We check if *[food score of record 1] * [food score of record 2]* is greater than or equal to Zero. As the following cases will be there:
- Both Members are Meat Eater : -1 x -1 = 1 [INCLUDED]
- One of the Member is Meat Eater and Other Vegan or Vegetarian : -1 x 1 = -1 [EXCLUDED]
- One of the Member is Meat Eater and Other Does Not Matter : -1 x 0 = 0 [INCLUDED]
- One of the Member is Vegan or Vegetarian and Other Does Not Matter : 1 x 0 = 0 [INCLUDED]
- Both of the Members are Either Vegan or Vegetarian : 1 x 1 = 1 [INCLUDED]
Scoring for Age Groups
For scoring age groups, we assigned some values to the groups as:
- 10-18 : 1
- 18-22 : 2
- 22-26 : 3
- 26-29 : 4
- 29-34 : 5
- 34-40 : 6
- 40-45 : 7
- 45-55 : 8
- 55-75 : 9
Age Score Calculation
For calculating Age Score the following formula has been used:
age_score = round((1 - (abs(Age Group Value Person 1 - Age Group Value of Person 2) / 10)), 2)
In the above formula we calculation has been done as follows:
- First we calculated the absolute value of the difference between the values of the age groups of the two persons.
- Then we divide it by 10 to normalize it.
- Further we subtracted this value from 1 to inverse the distance, so after this step we have higher value for persons in similar or closer age groups and lower value for persons in different or farther age groups.
Cases will be as:
- 18-22 and 18-22 :
round(1 - (abs(2 - 2) / 10), 2) = 1.0
- 45-55 and 45-55 :
round(1 - (abs(8 - 8) / 10), 2) = 1.0
- 18-22 and 45-55 :
round(1 - (abs(2 - 8) / 10), 2) = 0.4
- 10-18 and 55-75 :
round(1 - (abs(1 - 9) / 10), 2) = 0.2
Final Score Calculation
For calculating final Score we used:
Final Score = Food Score + Kitchen Score + Age Score
Then we have sorted the data on Final Score to obtain best Pairs.
Solution Code
import pandas as pd
import numpy as np
# Creating the DataFrame, here I have added the attribute 'name' for identifying the record.
df = pd.DataFrame({
'name' : ['jacob', 'mary', 'rick', 'emily', 'sabastein', 'anna',
'christina', 'allen', 'jolly', 'rock', 'smith', 'waterman',
'mimi', 'katie', 'john', 'rose', 'leonardo', 'cinthy', 'jim',
'paul'],
'sex' : ['m', 'f', 'm', 'f', 'm', 'f', 'f', 'm', 'f', 'm', 'm', 'm', 'f',
'f', 'm', 'f', 'm', 'f', 'm', 'm'],
'food' : [0, 0, 1, 3, 2, 3, 1, 0, 0, 3, 3, 2, 1, 2, 1, 0, 1, 0, 3, 1],
'age' : ['10-18', '22-26', '29-34', '40-45', '18-22', '34-40', '55-75',
'45-55', '26-29', '26-29', '18-22', '55-75', '22-26', '45-55',
'10-18', '22-26', '40-45', '45-55', '10-18', '29-34'],
'kitchen' : [0, 1, 2, 0, 1, 2, 2, 1, 0, 0, 1, 0, 1, 1, 1, 0, 2, 0, 2, 1],
})
# Adding a normalized field 'k_scr' for kitchen
df['k_scr'] = np.where((df['kitchen'] == 2), 0.5, df['kitchen'])
# Adding a normalized field 'f_scr' for food
df['f_scr'] = np.where((df['food'] == 1), 0, df['food'])
df['f_scr'] = np.where((df['food'] == 0), -1, df['f_scr'])
df['f_scr'] = np.where((df['food'] == 2), 1, df['f_scr'])
df['f_scr'] = np.where((df['food'] == 3), 1, df['f_scr'])
# Adding a normalized field 'a_scr' for age
df['a_scr'] = np.where((df['age'] == '10-18'), 1, df['age'])
df['a_scr'] = np.where((df['age'] == '18-22'), 2, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '22-26'), 3, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '26-29'), 4, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '29-34'), 5, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '34-40'), 6, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '40-45'), 7, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '45-55'), 8, df['a_scr'])
df['a_scr'] = np.where((df['age'] == '55-75'), 9, df['a_scr'])
# Printing DataFrame after adding normalized score values
print(df)
commonarr = [] # Empty array for our output
dfarr = np.array(df) # Converting DataFrame to Numpy Array
for i in range(len(dfarr) - 1): # Iterating the Array row
for j in range(i + 1, len(dfarr)): # Iterating the Array row + 1
# Check for Food Condition to include relevant records
if dfarr[i][6] * dfarr[j][6] >= 0:
# Check for Kitchen Condition to include relevant records
if dfarr[i][5] + dfarr[j][5] > 0:
row = []
# Appending the names
row.append(dfarr[i][0])
row.append(dfarr[j][0])
# Appending the final score
row.append((dfarr[i][6] * dfarr[j][6]) +
(dfarr[i][5] + dfarr[j][5]) +
(round((1 - (abs(dfarr[i][7] -
dfarr[j][7]) / 10)), 2)))
# Appending the row to the Final Array
commonarr.append(row)
# Converting Array to DataFrame
ndf = pd.DataFrame(commonarr)
# Sorting the DataFrame on Final Score
ndf = ndf.sort_values(by=[2], ascending=False)
print(ndf)
Input / Intermediate DataFrame with Scores
name sex food age kitchen k_scr f_scr a_scr
0 jacob m 0 10-18 0 0.0 -1 1
1 mary f 0 22-26 1 1.0 -1 3
2 rick m 1 29-34 2 0.5 0 5
3 emily f 3 40-45 0 0.0 1 7
4 sabastein m 2 18-22 1 1.0 1 2
5 anna f 3 34-40 2 0.5 1 6
6 christina f 1 55-75 2 0.5 0 9
7 allen m 0 45-55 1 1.0 -1 8
8 jolly f 0 26-29 0 0.0 -1 4
9 rock m 3 26-29 0 0.0 1 4
10 smith m 3 18-22 1 1.0 1 2
11 waterman m 2 55-75 0 0.0 1 9
12 mimi f 1 22-26 1 1.0 0 3
13 katie f 2 45-55 1 1.0 1 8
14 john m 1 10-18 1 1.0 0 1
15 rose f 0 22-26 0 0.0 -1 3
16 leonardo m 1 40-45 2 0.5 0 7
17 cinthy f 0 45-55 0 0.0 -1 8
18 jim m 3 10-18 2 0.5 1 1
19 paul m 1 29-34 1 1.0 0 5
Output
0 1 2
48 sabastein smith 4.0
10 mary allen 3.5
51 sabastein katie 3.4
102 smith jim 3.4
54 sabastein jim 3.4
99 smith katie 3.4
61 anna katie 3.3
45 sabastein anna 3.1
58 anna smith 3.1
14 mary rose 3.0
12 mary mimi 3.0
84 allen cinthy 3.0
98 smith mimi 2.9
105 waterman katie 2.9
11 mary jolly 2.9
50 sabastein mimi 2.9
40 emily katie 2.9
52 sabastein john 2.9
100 smith john 2.9
90 rock smith 2.8
47 sabastein rock 2.8
0 jacob mary 2.8
17 mary paul 2.8
13 mary john 2.8
119 katie jim 2.8
116 mimi paul 2.8
111 mimi john 2.8
103 smith paul 2.7
85 allen paul 2.7
120 katie paul 2.7
.. ... ... ...
This solution has further scope of optimization.