1

I have 2 dataframe contain pretest & postest data. I need to conditionally filter dataframe column based on string similarity on 'name' and 'school'.

 import pandas as pd
A = pd.DataFrame({  'name':['herlina sumarsi','ryan cooper','dudung surudung','asep berlian','Jade hedger','ryan prakasa'],
                    'school & school address':['Thomas Jefferson High School Alexandria ','Academic Magnet High School North Charleston','Signature School Evansville','Payton College Preparatory High School Chicago','SMA Al Fallah Tarogong','SMA 22 Bandung'] ,    
                    'pretest':[50,60,50,50,70,80]
                    })
B = pd.DataFrame({  'name':['herlina sumarsi PhD','Mr. ryan cooper ','dudung surudung MT.','asep berlian','dr. ryan prakasa'],
                    'school & school address':['Thomas Jefferson High School ','Academic Magnet High School','Signature School Evansville','Payton College Preparatory High School Chicago','SMA 22 Bandung'],
                    'postest':[70,80,80,90,90]
                         })

But the challenges are:

  1. Not all participant from data A has answered the posttest on data B.
  2. name on dataframe A and B is slightly different, on B some people write their tittle as well.
  3. As with data on school, on A some people write school address

I have tried merge, but it only filter the exact same data on both data frame.

pd.merge(A,B,how='inner',on='name')

I am seeking your kind advice for this. thank you in advance

Nadi
  • 11
  • 3

0 Answers0