Python split twice with strip in one line

Question

I have a string like this

PARAMS = 'TEST = xy; TEST2= klklk '

which I want to split twice, once at the ";" and second on the "=" and then put it in a dict.

I can do it with this line:

dict(item.split("=") for item in PARAMS.split(";"))

and get:

{' TEST2': ' klklk ', 'TEST ': ' xy'}

I would now also like to strip the key and value before putting them in the dict. Is there an elegant way to do it in one line in python?

score 5 · Answer 1 · answered Dec 16 '13 at 13:42

5

I don't know exactly what you call 'elegant', but this works:

dict((i.strip() for i in item.split("=")) for item in PARAMS.split(";"))

answered Dec 16 '13 at 13:42

aIKid

26,968
4
39
65

thefourtheye · Accepted Answer · 2013-12-16T14:02:35.200

3

dict([i.strip() for i in item.split("=")] for item in PARAMS.split(";"))

This runs a lot faster than @aIKid's solution :)

PARAMS = 'TEST = xy; TEST2= klklk '
from timeit import timeit
print timeit('dict((i.strip() for i in item.split("=")) for item in PARAMS.split(";"))', "from __main__ import PARAMS")
print timeit('dict([i.strip() for i in item.split("=")] for item in PARAMS.split(";"))', "from __main__ import PARAMS")

Output

18.7284784281
9.16360774723

edited Dec 16 '13 at 14:02

answered Dec 16 '13 at 13:42

thefourtheye

233,700
52
457
497

2

@aIKid: just as with [list comprehensions and `str.join()`](http://stackoverflow.com/a/9061024/100297), `dict()` *has* to have sequences for each `(key, value)` pair so that their length can be verified. Add to that that in Python 2.7 [list comprehensions don't use a new scope](http://stackoverflow.com/questions/4198906/python-list-comprehension-rebind-names-even-after-scope-of-comprehension-is-thi) but generator expressions do (making the fixed costs of a list comp lower), and you'll see why using a list comprehension here for the `(key, value)` pairs is faster. – Martijn Pieters Dec 16 '13 at 15:30
@MartijnPieters Woohoo, thanks for the explanation! Will keep that in mind. Thanks again! – aIKid Dec 16 '13 at 23:50
@Martijn Pieters: but in this case not inlining at all is even faster than list comprehension (because the second loop should not be a loop at all, but list comprehension lack necessary syntaxic power). – kriss Dec 17 '13 at 09:16
@kris: indeed; avoiding loops is faster still. Sometimes a one-liner is just not worth it! – Martijn Pieters Dec 17 '13 at 09:24
@kriss How can we do this without the second loop? – thefourtheye Dec 17 '13 at 09:25
@thefourtheye: the second split should just return key and value, other configurations are errors. We want to strip both, but it's still just two values. But in the list comprehension context we have not much choice but either tricking that to a loop of two (or performing two splits of the same value, which is possibly even worse). – kriss Dec 17 '13 at 09:43

kriss · Answer 3 · 2013-12-17T08:26:37.277

Maybe something like:

dict(map(lambda x: x.strip(), item.split("=")) for item in PARAMS.split(";"))

or another even more elegant version:

dict((l[i].strip(), l[i+1].strip()) for i in range(2) for l in [re.split(';|=', PARAMS)])

Of course this is elegant only if you take it as a synonym of obfuscated, but when we are seeking one-liners is it not what we mean ?

To solve this problem I would probably write:

d = dict(); 
for item in PARAMS.split(";"):
    key, value = item.split("=")
    d[key.strip()] = value.strip()

It is both easier to read and faster than all the proposed answer until now, and I didn't even bothered to optimize it in any way, henceforth it is probably not the best possible solution.

Don't believe it on words, time the different solutions to check:

PARAMS = 'TEST = xy; TEST2= klklk '

from timeit import timeit

print 'obfuscated', timeit('dict((l[i].strip(), l[i+1].strip()) for i in range(2) for l in [re.split(";|=", PARAMS)])', "from __main__ import PARAMS; import re")
print 'tuple', timeit('dict((i.strip() for i in item.split("=")) for item in PARAMS.split(";"))', "from __main__ import PARAMS")
print 'regex', timeit('dict(re.findall(r"(\S+)\s*=\s*([^\s;]+)", PARAMS))', "from __main__ import PARAMS; import re")
print 'lambda', timeit('dict(map(lambda x: x.strip(), item.split("=")) for item in PARAMS.split(";"))', "from __main__ import PARAMS; import re")
print 'list comprehension', timeit('dict([i.strip() for i in item.split("=")] for item in PARAMS.split(";"))', "from __main__ import PARAMS")
print 'replace spaces', timeit('dict(item.split("=") for item in PARAMS.replace(" ", "").split(";"))', "from __main__ import PARAMS; import re")

print 'not one line', timeit(
'''
    d = dict(); 
    for item in PARAMS.split(";"):
        key, value = item.split("=")
        d[key.strip()] = value.strip()
    d
''',
"from __main__ import PARAMS")

Below are the timing results:

obfuscated: 7.36826086044
tuple: 4.49374079704
regex: 3.61684799194
lambda: 3.51627087593
list comprehension: 2.90777206421
replace spaces: 2.46001887321
not one line: 1.71015286446

It speaks for itself.

PS: the reason why the not one line is faster is probably because it avoids creating an unecessary list structure, but directly store value in the dict. But that was a no brainer, not even voluntary.

score 1 · Answer 4 · answered Dec 16 '13 at 13:51

1

Or, alternatively:

import re
text = 'TEST = xy; TEST2= klklk '
params = dict(re.findall(r'(\S+)\s*=\s*([^\s;]+)', text))
# {'TEST': 'xy', 'TEST2': 'klklk'}

answered Dec 16 '13 at 13:51

Jon Clements

138,671
33
247
280

score 1 · Answer 5 · answered Dec 16 '13 at 13:58

If none of your keys or values have spaces inside them, then you're free to eliminate all spaces with a single replace method:

>>> dict(item.split("=") for item in PARAMS.replace(" ", "").split(";"))
{'TEST': 'xy', 'TEST2': 'klklk'}

This will eliminate more spaces than strip would, of course:

>>> PARAMS = 'TEST 3 = there should be spaces between these words '
>>> dict(item.split("=") for item in PARAMS.replace(" ", "").split(";"))
{'TEST3': 'thereshouldbespacesbetweenthesewords'}

Python split twice with strip in one line

5 Answers5