Stripping start/end characters on regular expression

Question

I have the following regular expression:

>>> re.findall(r'\r\n\d+\r\n',contents)[-1]
'\r\n1621\r\n'
>>> re.findall(r'\r\n\d+\r\n',contents)[-1].replace('\r','').replace('\n','')
'1621'

How would I improve the regular expression such that I don't need to use the python replace methods?

Note that the digit must be surrounded by those characters, I can't do a straight \d+.

You want a non-capturing group: http://stackoverflow.com/questions/3512471/non-capturing-group — Jon Egeland, Jan 27 '15 at 22:43
Do you care about not being able to match an adjacent digit's surrounded by \r\n ? — , Jan 29 '15 at 07:11

score 2 · Accepted Answer · answered Jan 27 '15 at 22:45

2

Simply use parenthesis:

re.findall(r'\r\n(\d+)\r\n',contents)[-1]

That way you match the given pattern and only get the parenthesis content in findall result.

answered Jan 27 '15 at 22:45

user

score 0 · Answer 2 · edited May 23 '17 at 11:49

0

user 5061 answer is great.
You can use .strip() to get rid of those "\r\n" special characters.

re.findall(r'\r\n\d+\r\n',contents)[-1].strip()

edited May 23 '17 at 11:49

Community

answered Jan 27 '15 at 22:57

Vagner Guedes

score 0 · Answer 3 · edited Jan 28 '15 at 00:30

0

You could use look-ahead and look-back assertions:

re.findall(r'(?<=\r\n)\d+(?=\r\n)',contents)[-1]

edited Jan 28 '15 at 00:30

tbodt

answered Jan 27 '15 at 23:30

3 Answers3