For some reason, I need to extract the fields in an xml doc with python re.
here is an eg. of the string I'll be applying the regex on:
payload2 = '<CheckEventRequest><EventList count="1"><Event event="0x80" path="\\c2_emcvnx.ntaplion.prv\\CHECK$\\demoshare1\\Engineering\\Benchmarking\\Thumbs.db" flag="0x2" protocol="0" server="C2_EMCVNX" share="demoshare1" clientIP="172.26.64.233" serverIP="172.26.64.225" timeStamp="0x536C47EF0003C836" userSid="S-1-5-21-665520413-3518186362-2792099713-500" ownerSid="S-1-5-21-665520413-3518186362-2792099713-500" fileSize="0x11000" desiredAccess="0x0" createDispo="0x0" ntStatus="0x0" relativePath="\\C2_EMCVNX\\demoshare1\\Engineering\\Benchmarking\\Thumbs.db"/></EventList></CheckEventRequest>'
Some of the fields you see above like 'clientIP' may not always be present.
The regex I have come up with is:
PAT3 = re.compile(r'.+(event="(?P<event_code>\S*?)"){1}[\S\s]+?(path="(?P<path>[\s\S]+?)"){0,1}[\S\s]+(clientIP="(?P<client_ip>[\S\s]+?)"){0,1}.*', re.UNICODE)
m1 = PAT3.search(payload2)
print m1.groupdict()
output:
{'path': '\\c2_emcvnx.ntaplion.prv\\CHECK$\\demoshare1\\Engineering\\Benchmarking\\Thumbs.db', 'client_ip': None, 'event_code': '0x80'}
But when I put {1}
instead of {0, 1}
after (?P<client_ip>[\S\s]+?)")
it works. However this defeats the case when the clientIP is absent.
Any idea on how can make the regex work in both cases where a field is present or not present?