Splitting a semicolon-separated string to a dictionary, in Python

Question

I have a string that looks like this:

"Name1=Value1;Name2=Value2;Name3=Value3"

Is there a built-in class/function in Python that will take that string and construct a dictionary, as though I had done this:

dict = {
    "Name1": "Value1",
    "Name2": "Value2",
    "Name3": "Value3"
}

I have looked through the modules available but can't seem to find anything that matches.

Thanks, I do know how to make the relevant code myself, but since such smallish solutions are usually mine-fields waiting to happen (ie. someone writes: Name1='Value1=2';) etc. then I usually prefer some pre-tested function.

I'll do it myself then.

does your question require to support `s = r'Name1='Value=2';Name2=Value2;Name3=Value3;Name4="Va\"lue;\n3"'` input (note: a semicolon inside a quoted string, a quote is escaped using a backslash, `\n` escape is used, both single and double quotes are used)? — jfs, Dec 21 '14 at 20:42
This question of mine is over 6 years old, the code which involved this has long since been replaced :) And no, it didn't require support for quotes. I just wanted to have a prebuilt function instead of writing something myself. However, the code is long gone. — Lasse V. Karlsen, Dec 21 '14 at 20:43

Brian · Accepted Answer · 2008-10-09T12:37:41.760

153

There's no builtin, but you can accomplish this fairly simply with a generator comprehension:

s= "Name1=Value1;Name2=Value2;Name3=Value3"
dict(item.split("=") for item in s.split(";"))

[Edit] From your update you indicate you may need to handle quoting. This does complicate things, depending on what the exact format you are looking for is (what quote chars are accepted, what escape chars etc). You may want to look at the csv module to see if it can cover your format. Here's an example: (Note that the API is a little clunky for this example, as CSV is designed to iterate through a sequence of records, hence the .next() calls I'm making to just look at the first line. Adjust to suit your needs):

>>> s = "Name1='Value=2';Name2=Value2;Name3=Value3"

>>> dict(csv.reader([item], delimiter='=', quotechar="'").next() 
         for item in csv.reader([s], delimiter=';', quotechar="'").next())

{'Name2': 'Value2', 'Name3': 'Value3', 'Name1': 'Value1=2'}

Depending on the exact structure of your format, you may need to write your own simple parser however.

edited Oct 09 '08 at 12:37

answered Oct 09 '08 at 11:43

Brian

116,865
28
107
112

the code doesn't handle quoting, try: `s = "Name1='Value;2';Name2=Value2;Name3=Value3"` (note: semicolon in the quoted `Name1` value). – jfs Dec 21 '14 at 20:18
1

I have no idea why the second example throws `AttributeError: '_csv.reader' object has no attribute 'next'` for me. Of course I did `import csv`. – Youngjae Aug 08 '19 at 11:10
@Brian Is there any way to store the values as integer rather than string? – ChasedByDeath Jul 26 '20 at 07:20
how can do the reverse of it @Brain – Jamil Noyda Nov 25 '20 at 08:32

score 6 · Answer 2 · answered Mar 01 '11 at 02:46

6

This comes close to doing what you wanted:

>>> import urlparse
>>> urlparse.parse_qs("Name1=Value1;Name2=Value2;Name3=Value3")
{'Name2': ['Value2'], 'Name3': ['Value3'], 'Name1': ['Value1']}

answered Mar 01 '11 at 02:46

Kyle Gibson

1,150
1
9
12

3

it breaks if there is `&` or `%` in the input. – jfs Dec 21 '14 at 20:59
@jfs but the string does not contain either of those. – Vishal Singh Aug 23 '20 at 08:08
3

@VishalSingh: most visitors on StackOverflow are from google and therefore answers here are not only for the original poster who asked the question. If I came here looking for how to parse a "semicolon-separated string to a dictionary, in Python" then my strings might contain `&` or `%` -- at the very least, it is worth mentioning that the answer doesn't work for such strings. – jfs Aug 24 '20 at 16:28

score 4 · Answer 3 · edited Mar 04 '19 at 13:24

4

s1 = "Name1=Value1;Name2=Value2;Name3=Value3"

dict(map(lambda x: x.split('='), s1.split(';')))

edited Mar 04 '19 at 13:24

Petter Friberg

21,252
9
60
109

answered Mar 04 '19 at 13:23

D. Om

41
1

score 1 · Answer 4 · edited Aug 23 '20 at 08:07

1

It can be simply done by string join and list comprehension

",".join(["%s=%s" % x for x in d.items()])

>>d = {'a':1, 'b':2}
>>','.join(['%s=%s'%x for x in d.items()])
>>'a=1,b=2'

edited Aug 23 '20 at 08:07

Vishal Singh

6,014
2
17
33

answered Dec 23 '14 at 11:37

vijay

679
2
7
15

easytiger · Answer 5 · 2013-03-27T00:28:07.170

-2

easytiger $ cat test.out test.py | sed 's/^/    /'
p_easytiger_quoting:1.84563302994
{'Name2': 'Value2', 'Name3': 'Value3', 'Name1': 'Value1'}
p_brian:2.30507516861
{'Name2': 'Value2', 'Name3': "'Value3'", 'Name1': 'Value1'}
p_kyle:7.22536420822
{'Name2': ['Value2'], 'Name3': ["'Value3'"], 'Name1': ['Value1']}
import timeit
import urlparse

s = "Name1=Value1;Name2=Value2;Name3='Value3'"

def p_easytiger_quoting(s):
    d = {}
    s = s.replace("'", "")
    for x in s.split(';'):
        k, v = x.split('=')
        d[k] = v
    return d


def p_brian(s):
    return dict(item.split("=") for item in s.split(";"))

def p_kyle(s):
    return urlparse.parse_qs(s)



print "p_easytiger_quoting:" + str(timeit.timeit(lambda: p_easytiger_quoting(s)))
print p_easytiger_quoting(s)


print "p_brian:" + str(timeit.timeit(lambda: p_brian(s)))
print p_brian(s)

print "p_kyle:" + str(timeit.timeit(lambda: p_kyle(s)))
print p_kyle(s)

edited Mar 27 '13 at 00:28

answered Mar 26 '13 at 23:59

easytiger

514
5
15

This doesn't answer the question, because it doesn't handle quoting. Try `s = "Name1='Value1=2';Name2=Value2" and `csv` (as in Brian's accepted answer) or `parse_qs` (as in Kyle's) will get it right, while yours will raise a `ValueError`. The OP specifically says "such smallish solutions are usually mine-fields waiting to happen", which is why he wants a built-in or other well tested solution, and he gives an example that will break your code. – abarnert Mar 27 '13 at 00:05
Ahh i didn't see that. still. it would still be faster than all your solutions to preparse those in the main string before the iteration takes place and recalling the replace function thousands of times. I will update – easytiger Mar 27 '13 at 00:13
I'm not sure how you're going to preparse it. But even if you do, this seems like exactly what the OP was afraid of in a simple solution. Are you sure there are no other mines ahead? Can you prove it to the OP's satisfaction? – abarnert Mar 27 '13 at 00:18
OK, now that I've seen your edit… First, `s.replace` doesn't do anything at all; it just returns a new string that you ignore. Second, even if you got it right (`s = s.replace…`), that doesn't fix the problem, it just adds a new one on top of it. Try it on either my example or the OP's. – abarnert Mar 27 '13 at 00:21
The specification clearly includes handling the sample input he mentioned in his question, `Name='Value1=2';`. And your code doesn't handle it. And I'm not sure how you'd sanitize that without parsing it in some way that will be just as slow as `urlparse` or `csv` in the first place. – abarnert Mar 27 '13 at 00:24
Your new attempt still doesn't fix the problem. Do this: `s = "Name1='Value1=2;';Name2=Value2;Name3='Value3'"`. Having an `=` or `;` inside the quotes is critical, because that's the whole point of quoting. – abarnert Mar 27 '13 at 00:29
Sorry i was trying to fix it from my phone. I've added a new updated. Also please look at the output of both of your functions they are all wrong. Brian, yours INCLUDES the quotes, in his specification he removes them from the map so the value is a string without quotes. And Kyle's puts each element in a map with a value as a list. abamert, I'm afraid that is incorrect it will be faster to swap them out at the start – easytiger Mar 27 '13 at 00:30
ahh i see what you mean... he didnt provide that in his example output. `"Name1='Value1=2;'` in that case if i had to handle that a regex split would work well. However this is prob largely an issue of user input sanitisation. (i know bad answer etc). – easytiger Mar 27 '13 at 00:34
I don't think this is a sanitisation issue; I think he really does need quoting. Since the question you're answering is 5 years old, it may not be easy to find out… but it's certainly reasonable. Formats very much like this are used in URL-encoded forms, config files, CSV, etc., and they all either have some kind of quoting or some kind of escaping instead. – abarnert Mar 27 '13 at 00:37
Also, you can trivially fix the other two answers. For Brian's… actually, it _doesn't_ include the quotes; it's already correct. But if it did, you'd just `….strip("'")` on the output. For Kyle's, do `{k:v[0] for k, v in …}`. But you can't trivially fix your original answer (or Brian's original one), because without a parser, handling quoting is hard. Anyway, the fact that it took this much effort just to _see_ the problem, much less solve it, should demonstrate what the OP meant by "minefield". – abarnert Mar 27 '13 at 00:39

Rabarberski · Answer 6 · 2013-04-24T10:28:34.060

-2

IF your Value1, Value2 are just placeholders for actual values, you can also use the dict() function in combination with eval().

>>> s= "Name1=1;Name2=2;Name3='string'"
>>> print eval('dict('+s.replace(';',',')+')')
{'Name2: 2, 'Name3': 'string', 'Name1': 1}

This is beacuse the dict() function understand the syntax dict(Name1=1, Name2=2,Name3='string'). Spaces in the string (e.g. after each semicolon) are ignored. But note the string values do require quoting.

edited Apr 24 '13 at 10:28

answered Apr 24 '13 at 10:22

Rabarberski

23,854
21
74
96

Thanks, upvote string.replace worked well. Don't know why I couldn't split. I did i = textcontrol.GetValue() on tc box, then o = i.split(';') but didn't output a string just complained about format, unlike replace. – Iancovici Jun 13 '13 at 18:18
1

`s.replace(';'`-based solution breaks if there is `;` inside a quoted value. [eval is evil](http://stackoverflow.com/a/9558001/4279) and it is unnecessary in this case. – jfs Dec 21 '14 at 21:05

Splitting a semicolon-separated string to a dictionary, in Python

6 Answers6

Linked

Related