-1

i have the following log line i want to parse

<log pos_start="40652288" end_position="40689664" log_type="1" length="37376" block_id="4024" block_position="18"/>

i am using python regex, this is what i did so far

regexParse = re.match(".*pos_start=(\d+).*end_position=(\d+).*log_type=(\d+).*length=(\d+).*block_id=(\d+).*block_position=(\d+).*",StatelogLine)
start_position = regexParse.group(1)
end_position = regexParse.group(2)

i am getting the following error

AttributeError: 'NoneType' object has no attribute 'group'

anyone has any ideas what is the problem

Dan The Man
  • 1,835
  • 6
  • 30
  • 50
  • 1
    Pretty simple: [your regex does not match your string](https://regex101.com/r/aW9aR7/1) - as a side node, the dot-star soup (`.*`) would be very inefficient as well. Additionally, why not use a parser instead? – Jan Apr 18 '16 at 06:44

4 Answers4

3

Pretty simple: your regex does not match your string because of the not specified double quotes, that is. If you add them, your regex works.
As a side node, the dot-star soup (.*) is very inefficient. Why not use a parser instead?

Consider the following code with BeautifulSoup:

from bs4 import BeautifulSoup
string = """<log pos_start="40652288" end_position="40689664" log_type="1" length="37376" block_id="4024" block_position="18"/>"""
xml = BeautifulSoup(string)
print xml.log["pos_start"]
# 40652288

You can access your element like an array afterwards, no druidic regex needed. Have a look at their homepage and documentation.

Jan
  • 42,290
  • 8
  • 54
  • 79
2

You could parse the line such as you get both key and value: Regex Demo

(\w+)="(\d+)"

and if you need you could also create a dict out of it:

import re

s = '<log pos_start="40652288" end_position="40689664" log_type="1" length="37376" block_id="4024" block_position="18"/>'

matches = re.findall(r'(\w+)="(\d+)"', s)
#[('pos_start', '40652288'),
# ('end_position', '40689664'),
# ('log_type', '1'),
# ('length', '37376'),
# ('block_id', '4024'),
# ('block_position', '18')]

d = dict(matches)
#{'block_id': '4024',
# 'block_position': '18',
# 'end_position': '40689664',
# 'length': '37376',
# 'log_type': '1',
# 'pos_start': '40652288'}
AKS
  • 18,983
  • 3
  • 43
  • 54
1

You regex is not correct. You need to escape double quotes to make it a successful match.

.*pos_start=\"(\d+)\" +end_position=\"(\d+)\" +log_type=\"(\d+)\" +length=\"(\d+)\" +block_id=\"(\d+)\" +block_position=\"(\d+)\"
Ashish
  • 266
  • 2
  • 9
0

you forgot the double qoutes

regexParse = re.match(".*pos_start=\"(\d+)\".*end_position=\"(\d+)\".*log_type=\"(\d+)\".*length=\"(\d+)\".*block_id=\"(\d+)\".*block_position=\"(\d+)\".*",s)
Avión
  • 7,963
  • 11
  • 64
  • 105
Zhenhao Chen
  • 515
  • 4
  • 6