The simplest way to do this is by applying a function that measures the similarity between the two sentences, there are plenty of similiraty mesures that could be used in this context like the hamming distance, however they are all relatively very limited, and you might be forced at some point -if in production- to have a machine learning model for this task.
import pandas as pd
def hamming_distance(chaine1, chaine2):
"""mesures the similiraty between two chaines of caracters
Note: this is a very limited as it only examines the positions of the letters in both chaines.
"""
return sum(c1 != c2 for c1, c2 in zip(chaine1, chaine2))
OCCUPATIONS = [ "Occupation","a-levels" , "University student" , "Full time employment"]
def get_most_similar(ocup,OCCUPATIONS):
"""return the most similar occupation from the unique values OCCUPATIONs to the entry ocup
"""
return min([(oc,hamming_distance(ocup.lower(),oc.lower())) for oc in OCCUPATIONS],key=lambda item:item[1])[0]
column = ["Occupation","a-level student","a level","alavls","university physics student","physics student","6th form student","builder"]
df = pd.DataFrame(column,columns=['occupation']) # this is just a reconstruction of your dataframe you probably don't need this line.
df['occupation']=df['occupation'].apply(lambda ocup : get_most_similar(ocup,OCCUPATIONS))
df.head(100)