0

I have the following message:

msg = "Cowlishaw Street & Athllon Drive, Greenway now free of obstruction."

I want to change things such as "Drive" to "Dr" or "Street" to "St"

expected_msg = "Cowlishaw St and Athllon Dr Greenway now free of obstruction"

I also have a "conversion function"

how do I check the list if such word is in it. and if so, change it with the "conversion" function. "conversion" is a dictionary that have word such as "Drive" act as a key and the value is "Dr"

this is what I have done

def convert_message(msg, conversion):
    msg = msg.translate({ord(i): None for i in ".,"})
    tokens = msg.strip().split(" ")
    for x in msg:
         if x in keys (conversion):


    return " ".join(tokens)
Yusuf Ning
  • 65
  • 1
  • 2
  • 9
  • 1
    Can you please try to get the formatting of your example code correct? – Alex Aug 18 '16 at 07:58
  • Can't you just use `msg.replace("Drive","Dr")` etc. ? – Chris_Rands Aug 18 '16 at 07:59
  • 3
    `for "Drive" in msg` is not proper Python at all. Since you have a dictionary, you should include it into the question. – Antti Haapala -- Слава Україні Aug 18 '16 at 07:59
  • You might want to look at nltk for tokenizing your string, btw. Handles punctuation and all that. – Björn Kristinsson Aug 18 '16 at 08:08
  • @Chris_Rands yes i can just do that. but my tutor said that is not allowed because its sort of hard coding. what we meant to do is iterate through the message of in the string. and then if we found words that is also a key in the dictionary "conversion" then we convert those words which is its corresponding value in that conversion dictionary. – Yusuf Ning Aug 18 '16 at 08:08

1 Answers1

0

Isn't it simply:

translations = {'Drive': 'Dr'}

for index, token in enumerate(tokens):
    if token in conversion:
        tokens[index] = conversion[token]

return ' '.join(tokens)

However, this wouldn't work on sentences like "Obstruction on Cowlishaw Street." since the token now would be Street.. Perhaps you should use a regular expression with re.sub:

import re
def convert_message(msg, conversion):
    def translate(match):
        word = match.group(0)
        if word in conversion:
            return conversion[word]
        return word

    return re.sub(r'\w+', translate, msg)

Here the re.sub finds 1 or more consecutive (+) alphanumeric characters (\w); and for each such regular expression match calls the given function, giving the match as a parameter; the matched word can be retrieved with match.group(0). The function should return a replacement for the given match - here, if the word is found in the dictionary we return that instead, otherwise the original is returned.

Thus:

>>> msg = "Cowlishaw Street & Athllon Drive, Greenway now free of obstruction."
>>> convert_message(msg, {'Drive': 'Dr', 'Street': 'St'})
'Cowlishaw St & Athllon Dr, Greenway now free of obstruction.'

As for the &, on Python 3.4+ you should use html.unescape to decode HTML entities:

>>> import html
>>> html.unescape('Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.')
'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'

This will take care of all known HTML entities. For older python versions you can see alternatives on this question.

The regular expression does not match the & character; if you want to replace it too, we can use regular expression \w+|. which means: "any consecutive run of alphanumeric characters, or then any single character that is not in such a run":

import re
import html


def convert_message(msg, conversion):
    msg = html.unescape(msg)

    def translate(match):
        word = match.group(0)
        if word in conversion:
            return conversion[word]
        return word

    return re.sub(r'\w+|.', translate, msg)

Then you can do

>>> msg = 'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'
>>> convert_message(msg, {'Drive': 'Dr', '&': 'and', 
                          'Street': 'St', '.': '', ',': ''})
'Cowlishaw St and Athllon Dr Greenway now free of obstruction'
Community
  • 1
  • 1
  • the first one works because the translation that you define there is already define for each cases of the word. and the testing is done seperately each time. thanks – Yusuf Ning Aug 18 '16 at 08:16
  • OP apparently wants `&amp'` -> `and` - but I'm sure they can work that one out with the translations :) – Jon Clements Aug 18 '16 at 08:16