0

Aim is to replace SMS in text with expansion. I achieve this by comparing stored column value in pandas and reading it in python as xlsx.

word    expansion
fyi     for your information
gtg     got to go
brb     be right back
gtg2    got to go too
fyii    sample test

Efforts so far:

Courtesy:

Replace words by checking from pandas dataframe

import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep

Output:

for your information got to go got to go2 for your informationi really 

Expected output:

 for your information got to go got to go too sample text really 

how to check word for word?

Programmer_nltk
  • 863
  • 16
  • 38

1 Answers1

1

I don't know whether it match exactly your requiremend, but you can try to put the word boundary (\b) at the end of each word in your pattern, in order to consider the whole word:

import re
import pandas as pd
sdf = pd.read_excel('expansion.xlsx')
rep = dict(zip(sdf.word, sdf.expansion)) #convert into dictionary
words = "fyi gtg gtg2 fyii really "
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile(r"\b|".join(rep.keys())+r"\b") # This line changes
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], words)
print rep

Output:

for your information got to go got to go too sample test really
migjimen
  • 551
  • 1
  • 4
  • 6