python - regex parsing log

Question

i have the following log line i want to parse

<log pos_start="40652288" end_position="40689664" log_type="1" length="37376" block_id="4024" block_position="18"/>

i am using python regex, this is what i did so far

regexParse = re.match(".*pos_start=(\d+).*end_position=(\d+).*log_type=(\d+).*length=(\d+).*block_id=(\d+).*block_position=(\d+).*",StatelogLine)
start_position = regexParse.group(1)
end_position = regexParse.group(2)

i am getting the following error

AttributeError: 'NoneType' object has no attribute 'group'

anyone has any ideas what is the problem

Pretty simple: [your regex does not match your string](https://regex101.com/r/aW9aR7/1) - as a side node, the dot-star soup (`.*`) would be very inefficient as well. Additionally, why not use a parser instead? — Jan, Apr 18 '16 at 06:44

Jan · Answer 1 · 2016-04-18T06:56:04.590

3

Pretty simple: your regex does not match your string because of the not specified double quotes, that is. If you add them, your regex works.
As a side node, the dot-star soup (.*) is very inefficient. Why not use a parser instead?

Consider the following code with BeautifulSoup:

from bs4 import BeautifulSoup
string = """<log pos_start="40652288" end_position="40689664" log_type="1" length="37376" block_id="4024" block_position="18"/>"""
xml = BeautifulSoup(string)
print xml.log["pos_start"]
# 40652288

You can access your element like an array afterwards, no druidic regex needed. Have a look at their homepage and documentation.

edited Apr 18 '16 at 06:56

answered Apr 18 '16 at 06:46

Jan

42,290
8
54
79

what do you mean by using parser? – Dan The Man Apr 18 '16 at 06:49
@danieltheman: See http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python – Jan Apr 18 '16 at 06:50
@danieltheman: I have added some example code, see the updated answer. – Jan Apr 18 '16 at 06:57

AKS · Accepted Answer · 2016-04-18T07:16:44.633

You could parse the line such as you get both key and value: Regex Demo

(\w+)="(\d+)"

and if you need you could also create a dict out of it:

import re

s = '<log pos_start="40652288" end_position="40689664" log_type="1" length="37376" block_id="4024" block_position="18"/>'

matches = re.findall(r'(\w+)="(\d+)"', s)
#[('pos_start', '40652288'),
# ('end_position', '40689664'),
# ('log_type', '1'),
# ('length', '37376'),
# ('block_id', '4024'),
# ('block_position', '18')]

d = dict(matches)
#{'block_id': '4024',
# 'block_position': '18',
# 'end_position': '40689664',
# 'length': '37376',
# 'log_type': '1',
# 'pos_start': '40652288'}

I like this one (+1) - straight forward. Better use a parser though. — Jan, Apr 18 '16 at 13:44

score 1 · Answer 3 · answered Apr 18 '16 at 06:47

1

You regex is not correct. You need to escape double quotes to make it a successful match.

.*pos_start=\"(\d+)\" +end_position=\"(\d+)\" +log_type=\"(\d+)\" +length=\"(\d+)\" +block_id=\"(\d+)\" +block_position=\"(\d+)\"

answered Apr 18 '16 at 06:47

Ashish

266
2
9

score 0 · Answer 4 · edited Apr 18 '16 at 06:50

0

you forgot the double qoutes

regexParse = re.match(".*pos_start=\"(\d+)\".*end_position=\"(\d+)\".*log_type=\"(\d+)\".*length=\"(\d+)\".*block_id=\"(\d+)\".*block_position=\"(\d+)\".*",s)

edited Apr 18 '16 at 06:50

Avión

7,963
11
64
105

answered Apr 18 '16 at 06:48

Zhenhao Chen

515
4
6

python - regex parsing log

4 Answers4