Regex sometimes doesn't work as expected

Question

I'm parsing TrackMania's .Gbx replay files. It's mixed with bytecode and XML header part that I'm interested in. I'm trying to extract that part from replay file. For most replays it works just fine. But I encountered specific replay that breaks regex.

import re

string = r'''
<header type="replay" exever="3.3.0" exebuild="2018-02-09_15_48" 
title="TMStadium"><map uid="Y48WnfHlw9SkYptpMIVkd0PUpRm" 
name="$fffTM$09FProLeague$fff xtasis -$09F GWF$fff2018
" author="w_1r" authorzone="World|Europe|Netherlands|Gelderland"/><desc 
envir="Stadium" mood="Day" maptype="TrackMania\Race"
mapstyle="" displaycost="2149" mod="" /><playermodel id="StadiumCar"/><times 
best="92373" respawns="1" stuntscore="7"
validable="1"/><checkpoints cur="13" onelap="13"/></header>
'''
header = r'(<header)(.*)(</header>)'
print(re.findall(header, string))

Other parts of file seems like don't matter, since even with hand copied header part, regex doesn't work.

Could anyone help to find what I'm missing?

Try: `(
)` This matched your string for me, but note that using inline modifiers in the middle of a regex is considered bad practice I believe. — user3483203, Feb 19 '18 at 17:29
In Python, it does not matter where an inline modifier is used, it will be applied to the whole regex pattern. `(
)` = `(?s)(
)`. Besides, you may just use `re.S` or `re.DOTALL` flag passed to `re.compile`. — Wiktor Stribiżew, Feb 19 '18 at 17:50
Thanks, I didn't know that. Also sorry for dublicate post, I didn't know that problem was multiple lines, so I couldn't find that post while searching. — SoulJam, Feb 19 '18 at 17:53

score 0 · Answer 1 · answered Feb 19 '18 at 17:36

0

Thank you chrisz, for suggesting to add (?s) mode modifier to my regex. To enable "single line mode" that makes the dot match all characters, including line breaks.

Correct regex:

(<header)((?s).*)(</header>)

answered Feb 19 '18 at 17:36

SoulJam

169
3
9

Regex sometimes doesn't work as expected

1 Answers1