How to use backslash in regex patterns?

Question

Simple regex task: find an ID (and language) within a string.

import re

txt = '<OB02 ID="1099367" LANG="FR">'
pattern = r'\\ID="(.*?)\\"'

result = re.findall(pattern, txt)

This gives an empty list as result. Leading to the questions:

related: https://stackoverflow.com/a/1732454/2828611 – inetphantom May 08 '20 at 09:06 — inetphantom, May 08 '20 at 09:06

inetphantom · Answer 1 · 2020-05-08T15:16:27.680

0

Use an xmlParser to parse xml and not regex.

As a workaround you can use the following regex:

import re

txt = '<OB02 ID="1099367" LANG="FR">'
pattern = 'ID="([^"]*)'

result = re.findall(pattern, txt)

As said, this is a bad idea, caus if someone now starts using single quotes or add comments, this will break.

edited May 08 '20 at 15:16

answered May 08 '20 at 09:04

inetphantom

That's exactly my first choice, too. But didn't work for this string (IMHO no proper XML). But very welcome if you find a way xml.etree.ElementTree can handle this. – VengaVenga May 08 '20 at 09:18