I have 2 panda data frames
df1=
student courses
0 student1 Math, Science, Gym
1 student2 gym, Geography, Art
2 student3 Chemistry
3 student4 Art, math, physics
4 student5 biology, history
5 student6 Robotics
df2=
class weight
0 math 9.00
1 science 8.00
2 gym 4.00
3 geography 6.00
4 art 7.00
5 chemistry 8.50
6 physics 10.00
7 biology 9.00
8 history 7.50
9 Philosophy 9.00
10 Physical Chemistry 10.00
11 Computer Science 8.75
I would like to search the the 'courses' column in df1 using the possible values in df2 'class' column and then return the appropriate summed up weights.
Here's what I have so far
#extract courses
class_list = df2['class'].values.tolist()
class_list_regex= ['\\b%s\\b' % x for x in class_list]
class_list_pattern = '|'.join(class_list_regex)
final_df = df1['courses'].str.findall('('+class_list_pattern+ ')',flags=re.IGNORECASE)
resulting in
0 [Math, Science, Gym]
1 [gym, Geography, Art]
2 [Chemistry]
3 [Art, math, physics]
4 [biology, history]
Name: courses, dtype: object
how do I get something like this
student matched_courses total weight
0 student1 [Math, Science, Gym] 21.0
1 student2 [gym, Geography, Art] 17.0
2 student3 [Chemistry] 8.5
3 student4 [Art, math, physics] 26.0
4 student5 [biology, history] 16.5
5 student6 [] 0.0
any help would be appreciated. Thanks