136

I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.

This is my current code:

coords = '0.71331, 52.25378'
coord_re = re.sub("(\d), (\d)", "\1,\2", coords)
print coord_re

But this gives me 0.7133,2.25378. What am I doing wrong?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Richard
  • 31,629
  • 29
  • 108
  • 145
  • 5
    Since you don't actually want to capture the digits, it may make more sense to use look-arounds, i.e.: `re.sub(r'(?<=\d), (?=\d)', ',', coords)`. – ig0774 Nov 16 '11 at 19:18
  • 6
    This particular question doesn't need regex, use replace: `coords.replace(' ', '')` – Gringo Suave Dec 28 '18 at 18:36

2 Answers2

187

You should be using raw strings for regex, try the following:

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):

>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2'   # this is what you actually want
\1,\2

Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 2
    Thanks, that did the trick. http://docs.python.org/library/re.html#raw-string-notation for anyone reading this. – Richard Nov 16 '11 at 19:16
  • 1
    Also http://stackoverflow.com/questions/2081640/what-exactly-do-u-and-rstring-flags-in-python-and-what-are-raw-string-litte/2081708#2081708 for a better explanation of what raw strings are. – Richard Nov 16 '11 at 19:17
  • How would you actually print the group name in the example above? Say, if group `\1` where called *xCoord*, is it possible to instruct `re.sub` to replace the sub strings with group names such that `re.sub(r"(\d), (\d)", r"\1,\2", coords)` resulted in the string literal `xCoord,52.25378` – zelusp Apr 29 '16 at 17:03
  • 2
    This doesn't work in Python3. Using `\1` replaces it with some bizarre unicode character. – Cerin May 25 '17 at 22:59
30

Python interprets the \1 as a character with ASCII value 1, and passes that to sub.

Use raw strings, in which Python doesn't interpret the \.

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

This is covered right in the beginning of the re documentation, should you need more info.

Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81