0

I have some text that I would like to extract Key=Value pairs from (see below). I've attempted to use a regex however the formatting of key=value pairs is not consistent. For example, many values are enclosed in quotes, some are not.

This is the regex which nearly worked, but there are a couple of outliers.

(\w*)=([\w,\",:,\-,(,\.,\+,\)]*)

Message meets Alert condition date=2020-08-20 time=00:33:57 devname=FGT3HD3999906624 devid=FGT3HD3999906624 logid="0100032003" type="event" subtype="system" level="information" vd="root" eventtime=1597847637407862934 tz="+1000" logdesc="Admin logout successful" sn="159999794" user="admin" ui="https(10.198.199.105)" method="https" srcip=10.198.199.105 dstip=192.168.23.254 action="logout" status="success" duration=4843 reason="timeout" msg="Administrator admin timed out on https(10.198.199.105)" Administrator IT Administrator Ph:

John Greenfield
  • 178
  • 1
  • 7
  • It doesn't look like you need regex for this. What makes you think you do? – MisterMiyagi Aug 23 '20 at 07:56
  • 1
    Does this answer your question? [Splitting a semicolon-separated string to a dictionary, in Python](https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python) – MisterMiyagi Aug 23 '20 at 08:00
  • My post did not show it, but the text is buried in the body of an email message which includes the "Message meets Alert condition" and "Administrator IT..." Also the fields are dynamic, hence the need for a Regex. – John Greenfield Aug 23 '20 at 09:03
  • Does this answer your question? [Splitting a semicolon-separated string to a dictionary, in Python](https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python) – Ryszard Czech Aug 23 '20 at 18:39

2 Answers2

2

You have a few ways to do this. First, since you said your key-value pairs are embedded in a larger email, you need to extract them. You can do that with this regex, which checks for a line starting with a word and an equals sign:

import re

text = " ... Full email text ... "
dataPoints = re.search(r"^\w*=.*$", text, re.MULTILINE).group(0)

Then you need to create your dictionary.

Option 1: Simplest

Use the following regex find:

result = dict(re.findall(r'(\w*)=(\".*?\"|\S*)', dataPoints))

Regex demo

Option 2: Typical split

Follow the typical method for this problem: split the various key-value combinations into a list, and then split each combination into separate keys and values. However, since your key-value pairs are separated by spaces rather than semicolons, ampersands, or something similar, and some of your values have spaces in them, we can't simply split by spaces. That means we need to use a regex lookahead for this to work properly:

regexSplit = dict([i.split("=") for i in re.split(r"\s(?=\w+=)", dataPoints)])

Option 3: No regex

If you want to avoid using regex altogether for whatever reason, you can use the following, which splits on equals signs and then recombines the keys and values into the proper arrangement for creating a dictionary:

allSplits = dataPoints.split("=")
splitList = [allSplits[0]] + [i for a in allSplits[1:-1] 
    for i in a.rsplit(" ", 1)] + [allSplits[-1]]

splitDict = dict(zip(splitList[::2], splitList[1::2]))

The code above assumes your dictionary will end up with at least 2 items.

Demo for all 3 options

jdaz
  • 5,964
  • 2
  • 22
  • 34
0

What about adding an OR (|) to your regex, e.g.

(\w*)=(\"[\w\s\+()\.]*\"|[\w\-\:\.]*)

matches the string you gave.
Note

  • \"[\w\s\+()\.]*\" matches all the values enclosed in ""
  • [\w\-\:\.]* matches the ones without
Stefan
  • 1,697
  • 15
  • 31