0

I am working on merging a few datasets regarding over 200 countries in the world. In cleaning the data I need to convert some three-letter codes for each country into the countries' full names.

The three-letter codes and country full names come from a separate CSV file, which shows a slightly different set of countries.

My question is: Is there a better way to write this?

str.replace("USA", "United States of America")
str.replace("CAN", "Canada")
str.replace("BHM", "Bahamas")
str.replace("CUB", "Cuba")
str.replace("HAI", "Haiti")
str.replace("DOM", "Dominican Republic")
str.replace("JAM", "Jamaica")

and so on. It goes on for another 200 rows. Thank you!

Bach
  • 6,145
  • 7
  • 36
  • 61
Coloane
  • 319
  • 1
  • 4
  • 12
  • 2
    possible duplicate of [Easiest way to replace a string using a dictionary of replacements?](http://stackoverflow.com/questions/2400504/easiest-way-to-replace-a-string-using-a-dictionary-of-replacements) – thefourtheye Apr 08 '14 at 06:06
  • Just out of curiosity, don't you need to check word boundaries? "USAIN BOLT" will be replaced to "United States of AmericaIN BOLT". – AkiRoss Apr 08 '14 at 06:13
  • @AkiRoss Just word boundaries? What about "The USA."? – Scorpion_God Apr 08 '14 at 06:17
  • I'm not sure about what you mean by that: are you questioning the semantic value of word boundary or its syntactic form? (Or something else?) – AkiRoss Apr 08 '14 at 08:56

3 Answers3

1

Since the number of substitution is high, I would instead iterate over the words in the string and replace based upon a dictionary lookup.

mapofcodes = {'USA': 'United States of America', ....}
for word in mystring.split():
    finalstr += mapofcodes.get(word, word)
spicavigo
  • 4,116
  • 22
  • 28
0

Try reading the CSV file into a dictionary to a 2D array, you can access which ever one you want then.

that is if I understand your question correctly.

MrHaze
  • 3,786
  • 3
  • 26
  • 47
0

Here's a regular expressions solution:

import re

COUNTRIES = {'USA': 'United States of America', 'CAN': 'Canada'}

def repl(m):
    country_code = m.group(1)
    return COUNTRIES.get(country_code, country_code)

p = re.compile(r'([A-Z]{3})')
my_string = p.sub(repl, my_string)
Scorpion_God
  • 1,499
  • 10
  • 15