Create a dictionary from colon separated key value string

Question

Trying to create a dictionary from a given string which can be of the format

key1:value1 key2:value2

however picking value is a problem as sometimes it may have

white spaces key1: value1
quotes key1: "value has space"

Identifier for key is something:

Tried below

def tokenize(msg):
    legit_args = [i for i in msg if ":" in i]
    print(legit_args)
    dline = dict(item.split(":") for item in legit_args)
    return dline

above only works for no space values.

then tried below

def tokenize2(msg):
    try:
        #return {k: v for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', msg)}
        return dict(token.split(':') for token in shlex.split(msg))
    except:
        return {}

this works well with key:"something given like this" but still needs some changes to work, below is the issue

>>> msg = 'key1: "this is value1 "   key2:this is value2 key3: this is value3'
>>> import shlex
>>> dict(token.split(':') for token in shlex.split(msg))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> shlex.split(msg)  # problem is here i think
['key1:', 'this is value1 ', 'key2:this', 'is', 'value2', 'key3:', 'this', 'is', 'value3']

What’s wrong with the second (regex) approach exactly? An example input and output would help — , Dec 01 '21 at 03:34
Beside the point, but [a bare `except` is bad practice](/q/54948548/4518341). Instead, use the specific exception you're expecting like `except ValueError`, or at least `except Exception`. — wjandrea, Dec 01 '21 at 03:35
added more details @KeoniGarner , i think this is just fine as values having space should be enclosed in quotes only — pythonRcpp, Dec 01 '21 at 03:47
Is it possible for a value to contain a colon inside the quotes? — wjandrea, Dec 01 '21 at 04:06

tshiono · Accepted Answer · 2021-12-30T11:36:17.577

Would you please try something like:

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3"
d = {}
for m in re.findall(r'\w+:\s*(?:\w+(?:\s+\w+)*(?=\s|$)|"[^"]+")', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.strip('"')
print(d)

Output:

{'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 '}

Explanation of the regex:

\w+:\s* matches a word followed by a colon and possible (zero or more) whitespaces.
(?: ... ) composes a non-capturing group.
:\w+(?:\s+\w+)*(?=\s|$) matches one or more words followed by a whitespace or end of the string.
The pipe character | alternates the regex pattern.
"[^"]+" matches a string enclosed by double quotes.

[Edit]
If you want to handle fancy quotes (aka curly quotes or smart quotes), please try:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40"
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

Output:

{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

[Edit2]
The following code now allows colon(s) in the values:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40 key5:\"value having:colon\""
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m, 1)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

Output:

{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key5': 'value having:colon', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

The modification is applied in the line:

key, val = re.split(r':\s*', m, 1)

adding the third argument 1 as maxsplit to limit the maximum count of split.

this doesnt work as expected when quotes are like title: “incorrect title” — pythonRcpp, Dec 20 '21 at 19:14
Naturally. I've updated my answer to support the “fancy quotes”. — tshiono, Dec 20 '21 at 20:44
key:10.20.30.40 , doesn't tokenize but key:"10.20.30.40" works fine, value can contain . — pythonRcpp, Dec 29 '21 at 05:18
Okay, I have updated my answer by modifying `\w` to `[\w.]` to match a dot as well. — tshiono, Dec 29 '21 at 05:56
values can sometimes have colon aswell like key:"value having:colon", can we handle this plz — pythonRcpp, Dec 30 '21 at 09:56
If the value includes colon character(s), is it assured the value is enclosed with quotes? — tshiono, Dec 30 '21 at 10:24
Thank you for the info. Now the code is fixed to allow colons in the values. Cheers. — tshiono, Dec 30 '21 at 11:37

Create a dictionary from colon separated key value string

1 Answers1