Find All Variation Of A Word In A Column

Question

How to identify all the variation of a word in a column_one, and then fill a value in other column, , columns_two, whenever a variation of that word is found?

E.g. Fill column value with P, whenever a variation of "PHIADELPHIA" is found, and fill with I, whenever a variation of "ILLINOIS" if found.

place	value
PHIADELPHIA
PHIALDELPHIA
PHIDELPHIA
illinois
PHIELADELPHIA
PHIILADELPHIA
illinoi
PHILA
PHILA.
PHILAD
PHILADALPHIA
PHILADELPHIA
PHILADELAPHIA
PHILADELHIA
PHILADELHPIA
PHILADELLPHIA
PHILADELPHIA
PHILADELPH
PHILADELPHA
PHILADELPHAI
PHILADELPHI
PHILADELPHIA

Fuzzy Matching, Levenshtein distance, etc

Input String:

import pandas as pd
import numpy as np

place = ['PHIADELPHIA','PHIALDELPHIA','PHIDELPHIA','illinois','PHIELADELPHIA','PHIILADELPHIA','illinoi','PHILA','PHILA.','PHILAD','PHILADALPHIA','PHILADELPHIA','PHILADELAPHIA','PHILADELHIA','PHILADELHPIA','PHILADELLPHIA','PHILADELPHIA','PHILADELPH','PHILADELPHA','PHILADELPHAI','PHILADELPHI','PHILADELPHIA']
value=[np.nan]*len(place)
df = pd.DataFrame(zip(place,value), columns=["place", "value"])
df

I have checked `fuzzywuzzy`, however, need help in filling the vlaues in `value` column, whenever a variation of word1 or word2 is encountered. How to implement that logic is the main concern @CaptainCaveman — fast_crawler, May 17 '23 at 21:21
Does something like this help? `df.loc[df["place"].isin(["PHIADELPHIA", "PHILA"]), "value"] = "Philadelphia"`. The list should have all possibilities you found for Philadelphia. Also, you can refer to [here](https://stackoverflow.com/questions/60987641/check-if-there-is-a-similar-string-in-the-same-column) — Paulo Marques, May 17 '23 at 22:02

score 0 · Answer 1 · answered May 17 '23 at 22:27

0

A solution using fuzzywuzzy

from fuzzywuzzy import fuzz

threshold = 50
df['value'] = df['place'].apply(lambda x: 'P' if fuzz.token_set_ratio(x, 'Philadelphia') >= threshold else 'I' if fuzz.token_set_ratio(x, 'ILLINOIS') >= threshold else None)

answered May 17 '23 at 22:27

PARAK

130
8

Find All Variation Of A Word In A Column

Input String:

1 Answers1