Isn't it simply:
translations = {'Drive': 'Dr'}
for index, token in enumerate(tokens):
if token in conversion:
tokens[index] = conversion[token]
return ' '.join(tokens)
However, this wouldn't work on sentences like "Obstruction on Cowlishaw Street."
since the token now would be Street.
. Perhaps you should use a regular expression with re.sub
:
import re
def convert_message(msg, conversion):
def translate(match):
word = match.group(0)
if word in conversion:
return conversion[word]
return word
return re.sub(r'\w+', translate, msg)
Here the re.sub
finds 1 or more consecutive (+
) alphanumeric characters (\w
); and for each such regular expression match calls the given function, giving the match as a parameter; the matched word can be retrieved with match.group(0)
. The function should return a replacement for the given match - here, if the word is found in the dictionary we return that instead, otherwise the original is returned.
Thus:
>>> msg = "Cowlishaw Street & Athllon Drive, Greenway now free of obstruction."
>>> convert_message(msg, {'Drive': 'Dr', 'Street': 'St'})
'Cowlishaw St & Athllon Dr, Greenway now free of obstruction.'
As for the &
, on Python 3.4+ you should use html.unescape
to decode HTML entities:
>>> import html
>>> html.unescape('Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.')
'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'
This will take care of all known HTML entities. For older python versions you can see alternatives on this question.
The regular expression does not match the &
character; if you want to replace it too, we can use regular expression \w+|.
which means: "any consecutive run of alphanumeric characters, or then any single character that is not in such a run":
import re
import html
def convert_message(msg, conversion):
msg = html.unescape(msg)
def translate(match):
word = match.group(0)
if word in conversion:
return conversion[word]
return word
return re.sub(r'\w+|.', translate, msg)
Then you can do
>>> msg = 'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'
>>> convert_message(msg, {'Drive': 'Dr', '&': 'and',
'Street': 'St', '.': '', ',': ''})
'Cowlishaw St and Athllon Dr Greenway now free of obstruction'