1

The following regular expression

\s*([\w_]*)\s*(,\s*|=\s*(\d*)\s*,)\n

matches the following line (with appended newline)

  _FIRST_ELEMENT_      = 10000,

on Windows but not on Mac. The environment I'm using it in is the Python implementation of Cinema 4D (3D Software) which uses the CPython 2.6 interpreter.

Someone was able to do a quick test for me, because I do not own a Mac. But he does not have the time to do more tests for me.

On both Platforms (Win/Mac) the same code has been tested in the Scripting Window of Cinema 4D.

import re
enum_match = re.compile('\s*(\w*)\s*(,\s*|=\s*(\d*)\s*,)\n')
line = '  _FIRST_ELEMENT_      = 10000,\n'
match = enum_match.match(line)

if not match:
    print "Regex did not match."
else:
    print match.groups()

Output on Windows:

('_FIRST_ELEMENT_', '= 10000,', '10000')

Output on Mac:

Regex did not match.

The only thing I can think of is that the underscore (_) is not included in \w on Mac.

Do you know why the regular expression matches on Windows but not on Mac?

Niklas R
  • 16,299
  • 28
  • 108
  • 203
  • 1
    Works fine on Snow Leopard, with bundled Python 2.5/2.6 as well as with MacPorts's Python 2.6/2.7. Note that Windows uses `\r\n` for newlines, whereas OSX uses `\n` only (by default), but that's doesn't seem related to this particular example, since you're using `\n` explicitly. It may depend on what your input files use. – Bruno May 15 '12 at 13:52
  • [Regular expression to match cross-platform newline characters](http://stackoverflow.com/questions/1331815/regular-expression-to-match-cross-platform-newline-characters) – VenkatH May 15 '12 at 13:55

2 Answers2

2

Use this instead:

 enum_match = re.compile('\s*(\w*)\s*(,\s*|=\s*(\d*)\s*,)$')

Mac OS X and Windows use different characters to mark the end of a line in text files; it appears that your file uses the Windows variety. '\n', I believe, matches the character(s) uses by the operating system the code is running under, which may not be the characters use in the file. Using '$' instead of '\n' in your regular expression should work under either operating system (even if this explanation isn't quite correct).

chepner
  • 497,756
  • 71
  • 530
  • 681
  • 1
    Small correction: `\n` is *not* interpreted differently on different platforms. `\n` always means the literal line-feed character (ASCII character `0x0A`.) Similarly, `\r` always means the literal carriage-return character (ASCII `0x0d`.) To see this, try `re.compile('\r',re.DEBUG)` in Python. – Li-aung Yip May 15 '12 at 14:10
  • @Li-aungYip: thanks for the correction. I think I'm confusing this with newline handling when reading/writing to a file. – chepner May 15 '12 at 14:48
1

I assume the newline character \n is the problem, since it is not the same on all systems.

You can do something more general like

\s*([\w_]*)\s*(,\s*|=\s*(\d*)\s*,)(?:\r\n?|\n)

this would match \r with an optional \n following, or only \n, I think this would cover all of the combinations that are used as newline sequences nowadays.

stema
  • 90,351
  • 20
  • 107
  • 135
  • You'll catch two line returns with `[\n\r]{1,2}`: `\n\n`. – Bruno May 15 '12 at 13:53
  • @Bruno thats true if there is only one newline character per line and a line is following that does only consist of a newline like `\n\n`. – stema May 15 '12 at 13:55
  • @jadkik94 that would not match `\r` alone. – stema May 15 '12 at 13:59
  • @stema, `\r` was last used for Mac OS 9, which is quite ancient (apparently discontinued in 2002). – Bruno May 15 '12 at 14:01
  • @stema Right, I didn't know it was used somewhere. Seems it's used in some old Mac, from the link in the OP comment. – jadkik94 May 15 '12 at 14:01