How to identify all the variation of a word in a column_one, and then fill a value in other column, , columns_two, whenever a variation of that word is found?
E.g. Fill column value
with P, whenever a variation of "PHIADELPHIA" is found, and fill with I, whenever a variation of "ILLINOIS" if found.
place | value |
---|---|
PHIADELPHIA | |
PHIALDELPHIA | |
PHIDELPHIA | |
illinois | |
PHIELADELPHIA | |
PHIILADELPHIA | |
illinoi | |
PHILA | |
PHILA. | |
PHILAD | |
PHILADALPHIA | |
PHILADELPHIA | |
PHILADELAPHIA | |
PHILADELHIA | |
PHILADELHPIA | |
PHILADELLPHIA | |
PHILADELPHIA | |
PHILADELPH | |
PHILADELPHA | |
PHILADELPHAI | |
PHILADELPHI | |
PHILADELPHIA |
Fuzzy Matching, Levenshtein distance, etc
Input String:
import pandas as pd
import numpy as np
place = ['PHIADELPHIA','PHIALDELPHIA','PHIDELPHIA','illinois','PHIELADELPHIA','PHIILADELPHIA','illinoi','PHILA','PHILA.','PHILAD','PHILADALPHIA','PHILADELPHIA','PHILADELAPHIA','PHILADELHIA','PHILADELHPIA','PHILADELLPHIA','PHILADELPHIA','PHILADELPH','PHILADELPHA','PHILADELPHAI','PHILADELPHI','PHILADELPHIA']
value=[np.nan]*len(place)
df = pd.DataFrame(zip(place,value), columns=["place", "value"])
df