Python 2.7 RE Search by condition

Question

When I am using re.search, I have some problem.

For example:

a = '<span class="chapternum">1 </span>abc，def.</span>'

How can I search the number '1'? Or how to search by matching digit start with ">" and end with writespace?

I tried:

test = re.search('(^>)(\d+)(\s$)', a)
print test
>> []

It is fail to get the number "1"

Have you considered using an actual HTML parser? Using regex is [notoriously unwise](https://stackoverflow.com/a/1732454/3001761). — jonrsharpe, Jul 09 '17 at 07:47
As funny as the answer ( @jonrsharpe linked) may have been, I will never forget it. Do not use regex to parse html, ever. — Marcel Wilson, Jul 11 '17 at 19:08

fredtantini · Accepted Answer · 2017-07-09T08:14:27.790

0

^ and $ indicate the beginning and the end of the string. If you get rid of them you have your answer:

>>> test = re.search('(>)(\d+)(\s)', a)
>>> test.groups()
('>', '1', ' ')

Not sure that you need the first and last groups though (capturing with parenthesis):

>>> a = '<span class="chapternum">23 </span>abc，def.</span>' 
>>> test = re.search('>(\d+)\s', a)
>>> test.group(1)
'23'

edited Jul 09 '17 at 08:14

answered Jul 09 '17 at 07:36

fredtantini

Thanks for your quickly response. Although this can find out '1', I can't sure that all the digit will match group(1) in my hold program. Therefore, I am seeking a way to search only the result that like this pattern: >\d+\s. – Enoch Jul 09 '17 at 07:52
`test.group(1)` means 'the first group', the `\d+` part captures all digits. – fredtantini Jul 09 '17 at 08:52
Oh. I got it. Thanks a lot. – Enoch Jul 10 '17 at 01:48

1 Answers1