1
charmap = [
  (u"\u201c\u201d", "\""),
  (u"\u2018\u2019", "'")
  ]

_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
print fixed

I was looking to write a similar script to replace smart quotes and curly apostrophes from text answered here here: Would someone be kind enough to explain the two lines:

_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)

and possibly rewrite them in a longer-winded format with comments to explain what is exactly going on - I'm a little confused whether its an inner/outer loop combo or sequential checking over items in a dictionary.

Community
  • 1
  • 1
Ghoul Fool
  • 6,249
  • 10
  • 67
  • 125
  • What precisely don't you understand about those two lines? – jonrsharpe Oct 30 '16 at 16:08
  • first line creates a dict by generating tuples (char => replacement), second line applies the dict transformation on each char, defaulting to original value if not in dict. – Jean-François Fabre Oct 30 '16 at 16:09
  • @jonrsharpe I'm unsure if _map is a new dictionary which looks up items in charmap concurrently or in an inner loop with c as list of chars. The second line is a bit easier but I'm unfamiliar with .get (c,c) – Ghoul Fool Oct 31 '16 at 19:52
  • So did you trying printing `_map`? Reading the docs on `dict.get`? – jonrsharpe Oct 31 '16 at 19:53
  • @jonrsharpe No, that's the sensible thing to do. I got super confused as to what could actually be printed at that stage - I assumed that most things would kick out errors in printing, _map possibly being unprintable unicode or unicode not printing to a DOS terminal at that point. :) – Ghoul Fool Oct 31 '16 at 21:55

2 Answers2

3
_map = dict((c, r) for chars, r in charmap for c in list(chars))

means:

_map = {}                     # an empty dictionary
for (c, r) in charmap:        # c - string of symbols to be replaced, r - replacement
    for chars in list(c):     # chars - individual symbol from c
        _map[chars] = r       # adding entry replaced:replacement to the dictionary

and

fixed = "".join(_map.get(c, c) for c in s)

means

fixed = ""                          # an empty string   
for c in s:
    fixed = fixed + _map.get(c, c)  # first "c" is key, second is default for "not found"

as method .joinsimply concatenates elements of sequence with given string as a separators between them (in this case "", i. e. without a separator)

MarianD
  • 13,096
  • 12
  • 42
  • 54
2

It's faster and more straightforward to use the built in string function translate:

#!python2
#coding: utf8

# Start with a Unicode string.
# Your codecs.open() will read the text in Unicode
text = u'''\
"Don't be dumb"
“You’re smart!”
'''

# Build a translation dictionary.
# Keys are Unicode ordinal numbers.
# Values can be ordinals, Unicode strings, or None (to delete)
charmap = { 0x201c : u'"',
            0x201d : u'"',
            0x2018 : u"'",
            0x2019 : u"'" }

print text.translate(charmap)

Output:

"Don't be dumb"
"You're smart!"
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251