Python Regex - Parse string and extract key=value pairs

Question

I have some text that I would like to extract Key=Value pairs from (see below). I've attempted to use a regex however the formatting of key=value pairs is not consistent. For example, many values are enclosed in quotes, some are not.

This is the regex which nearly worked, but there are a couple of outliers.

(\w*)=([\w,\",:,\-,(,\.,\+,\)]*)

Message meets Alert condition date=2020-08-20 time=00:33:57 devname=FGT3HD3999906624 devid=FGT3HD3999906624 logid="0100032003" type="event" subtype="system" level="information" vd="root" eventtime=1597847637407862934 tz="+1000" logdesc="Admin logout successful" sn="159999794" user="admin" ui="https(10.198.199.105)" method="https" srcip=10.198.199.105 dstip=192.168.23.254 action="logout" status="success" duration=4843 reason="timeout" msg="Administrator admin timed out on https(10.198.199.105)" Administrator IT Administrator Ph:

It doesn't look like you need regex for this. What makes you think you do? — MisterMiyagi, Aug 23 '20 at 07:56
Does this answer your question? [Splitting a semicolon-separated string to a dictionary, in Python](https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python) — MisterMiyagi, Aug 23 '20 at 08:00
My post did not show it, but the text is buried in the body of an email message which includes the "Message meets Alert condition" and "Administrator IT..." Also the fields are dynamic, hence the need for a Regex. — John Greenfield, Aug 23 '20 at 09:03
Does this answer your question? [Splitting a semicolon-separated string to a dictionary, in Python](https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python) — Ryszard Czech, Aug 23 '20 at 18:39

jdaz · Accepted Answer · 2020-08-24T23:06:58.293

You have a few ways to do this. First, since you said your key-value pairs are embedded in a larger email, you need to extract them. You can do that with this regex, which checks for a line starting with a word and an equals sign:

import re

text = " ... Full email text ... "
dataPoints = re.search(r"^\w*=.*$", text, re.MULTILINE).group(0)

Then you need to create your dictionary.

Option 1: Simplest

Use the following regex find:

result = dict(re.findall(r'(\w*)=(\".*?\"|\S*)', dataPoints))

Regex demo

Option 2: Typical split

Follow the typical method for this problem: split the various key-value combinations into a list, and then split each combination into separate keys and values. However, since your key-value pairs are separated by spaces rather than semicolons, ampersands, or something similar, and some of your values have spaces in them, we can't simply split by spaces. That means we need to use a regex lookahead for this to work properly:

regexSplit = dict([i.split("=") for i in re.split(r"\s(?=\w+=)", dataPoints)])

Option 3: No regex

If you want to avoid using regex altogether for whatever reason, you can use the following, which splits on equals signs and then recombines the keys and values into the proper arrangement for creating a dictionary:

allSplits = dataPoints.split("=")
splitList = [allSplits[0]] + [i for a in allSplits[1:-1] 
    for i in a.rsplit(" ", 1)] + [allSplits[-1]]

splitDict = dict(zip(splitList[::2], splitList[1::2]))

The code above assumes your dictionary will end up with at least 2 items.

Demo for all 3 options

Thanks, this works perfectly and also caters for the dictionary creation. — John Greenfield, Aug 23 '20 at 09:29

Stefan · Answer 2 · 2020-08-24T05:13:43.060

0

What about adding an OR (|) to your regex, e.g.

(\w*)=(\"[\w\s\+()\.]*\"|[\w\-\:\.]*)

matches the string you gave.
Note

\"[\w\s\+()\.]*\" matches all the values enclosed in ""
[\w\-\:\.]* matches the ones without

edited Aug 24 '20 at 05:13

answered Aug 23 '20 at 07:46

Stefan

1,697
15
31

Thanks, the addition of pipe symbol catered for the outliers :) – John Greenfield Aug 23 '20 at 09:08
1

`(\w*)=(\".*?\"|\S*)` is much simpler: https://regex101.com/r/m4o3LO/1 – jdaz Aug 23 '20 at 09:19
1

`\d` is already included in `\w`, it doesn't make sense to put both in a character class. – Toto Aug 23 '20 at 09:25
@Toto You are right, of course `\w` maches all alphanumeric characters. I updated the answer. – Stefan Aug 24 '20 at 05:10
@jdaz Yes, it looks also way cleaner. – Stefan Aug 24 '20 at 05:11

Python Regex - Parse string and extract key=value pairs

2 Answers2