I have 2 dataframe contain pretest & postest data. I need to conditionally filter dataframe column based on string similarity on 'name' and 'school'.
import pandas as pd
A = pd.DataFrame({ 'name':['herlina sumarsi','ryan cooper','dudung surudung','asep berlian','Jade hedger','ryan prakasa'],
'school & school address':['Thomas Jefferson High School Alexandria ','Academic Magnet High School North Charleston','Signature School Evansville','Payton College Preparatory High School Chicago','SMA Al Fallah Tarogong','SMA 22 Bandung'] ,
'pretest':[50,60,50,50,70,80]
})
B = pd.DataFrame({ 'name':['herlina sumarsi PhD','Mr. ryan cooper ','dudung surudung MT.','asep berlian','dr. ryan prakasa'],
'school & school address':['Thomas Jefferson High School ','Academic Magnet High School','Signature School Evansville','Payton College Preparatory High School Chicago','SMA 22 Bandung'],
'postest':[70,80,80,90,90]
})
But the challenges are:
- Not all participant from data A has answered the posttest on data B.
- name on dataframe A and B is slightly different, on B some people write their tittle as well.
- As with data on school, on A some people write school address
I have tried merge, but it only filter the exact same data on both data frame.
pd.merge(A,B,how='inner',on='name')
I am seeking your kind advice for this. thank you in advance