Python Regex convert matches to a dictionary

Question

text = "Bob|19|01012017"
pat = re.compile("(?P<name>.+)|.*|(?P<bday>.+)") #hopefully this regex is correct
result = pat.match(text)
d = result.groupdict()
print d

What I get for d is:

{'bday': None, 'name': 'Bob|19|01012017'}

What I want is:

{bday: "01012017", name: "Bob"}

Can someone point out what I am doing wrong? I only need two fields for dict so I didn't write the age part.

Psidom · Accepted Answer · 2017-10-28T19:20:57.490

13

You need to escape | to match literally otherwise the pattern is interpreted as or:

text = "Bob|19|01012017"
pat = re.compile("(?P<name>.+)\|.*\|(?P<bday>.+)") 
result = pat.match(text)
d = result.groupdict()

d
# {'bday': '01012017', 'name': 'Bob'}

A quick test against split method on the speed:

text = "Bob|19|01012017"
pat = re.compile("(?P<name>.+)\|.*\|(?P<bday>.+)")

def regex_to_dict(texts, pat):
    return [pat.match(text).groupdict() for text in texts]

regex_to_dict([text], pat)
# [{'bday': '01012017', 'name': 'Bob'}]

def split_to_dict(texts):
    dd = []
    for text in texts:
        name, _, bday = text.split('|')
        dd.append({'bday': bday, 'name': name})
    return dd

split_to_dict([text])
# [{'bday': '01012017', 'name': 'Bob'}]

texts = [text] * 100000

%timeit regex_to_dict(texts, pat)
# 10 loops, best of 3: 119 ms per loop

%timeit split_to_dict(texts)
# 10 loops, best of 3: 58.6 ms per loop

edited Oct 28 '17 at 19:20

answered Oct 28 '17 at 19:05

Psidom

209,562
33
339
356

OMG thank you. I didn't know | was taken. What does | mean in regex? – Hanming Zeng Oct 28 '17 at 19:06
1

It means `or`, `blah|foo` matches `blah` or `foo` for instance. – Psidom Oct 28 '17 at 19:07
Also side question: in terms of time complexity, is using this faster or is using string.split and then construct the dict faster? – Hanming Zeng Oct 28 '17 at 19:11
I don't have much experience with the speed. But my first guess is `split`, if your string is as regular as you've shown. – Psidom Oct 28 '17 at 19:13
I added a rough speed test. `split` is about twice faster than regex in this case. – Psidom Oct 28 '17 at 19:22

score 1 · Answer 2 · answered Oct 28 '17 at 19:12

1

For such simple case you may use simple str.split() approach:

text = "Bob|19|01012017"
items = text.split('|')
d = {'bday': items[-1], 'name': items[0]}

print(d)

The output:

{'name': 'Bob', 'bday': '01012017'}

answered Oct 28 '17 at 19:12

RomanPerekhrest

88,541
4
65
105

Python Regex convert matches to a dictionary

2 Answers2