-1

Simple regex task: find an ID (and language) within a string.

import re

txt = '<OB02 ID="1099367" LANG="FR">'
pattern = r'\\ID="(.*?)\\"'

result = re.findall(pattern, txt)

This gives an empty list as result. Leading to the questions:

  • How to correctly encapsulate \" in python?
  • How to extract ID and LANG from txt?
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
VengaVenga
  • 680
  • 1
  • 10
  • 13

1 Answers1

0

Use an xmlParser to parse xml and not regex.

As a workaround you can use the following regex:

import re

txt = '<OB02 ID="1099367" LANG="FR">'
pattern = 'ID="([^"]*)'

result = re.findall(pattern, txt)

As said, this is a bad idea, caus if someone now starts using single quotes or add comments, this will break.

inetphantom
  • 2,498
  • 4
  • 38
  • 61
  • That's exactly my first choice, too. But didn't work for this string (IMHO no proper XML). But very welcome if you find a way xml.etree.ElementTree can handle this. – VengaVenga May 08 '20 at 09:18