3

I have a string like this

PARAMS = 'TEST = xy; TEST2= klklk '

which I want to split twice, once at the ";" and second on the "=" and then put it in a dict.

I can do it with this line:

dict(item.split("=") for item in PARAMS.split(";"))

and get:

{' TEST2': ' klklk ', 'TEST ': ' xy'}

I would now also like to strip the key and value before putting them in the dict. Is there an elegant way to do it in one line in python?

Fabian
  • 5,476
  • 4
  • 35
  • 46

5 Answers5

5

I don't know exactly what you call 'elegant', but this works:

dict((i.strip() for i in item.split("=")) for item in PARAMS.split(";"))
aIKid
  • 26,968
  • 4
  • 39
  • 65
3
dict([i.strip() for i in item.split("=")] for item in PARAMS.split(";"))

This runs a lot faster than @aIKid's solution :)

PARAMS = 'TEST = xy; TEST2= klklk '
from timeit import timeit
print timeit('dict((i.strip() for i in item.split("=")) for item in PARAMS.split(";"))', "from __main__ import PARAMS")
print timeit('dict([i.strip() for i in item.split("=")] for item in PARAMS.split(";"))', "from __main__ import PARAMS")

Output

18.7284784281
9.16360774723
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
  • 2
    @aIKid: just as with [list comprehensions and `str.join()`](http://stackoverflow.com/a/9061024/100297), `dict()` *has* to have sequences for each `(key, value)` pair so that their length can be verified. Add to that that in Python 2.7 [list comprehensions don't use a new scope](http://stackoverflow.com/questions/4198906/python-list-comprehension-rebind-names-even-after-scope-of-comprehension-is-thi) but generator expressions do (making the fixed costs of a list comp lower), and you'll see why using a list comprehension here for the `(key, value)` pairs is faster. – Martijn Pieters Dec 16 '13 at 15:30
  • @MartijnPieters Woohoo, thanks for the explanation! Will keep that in mind. Thanks again! – aIKid Dec 16 '13 at 23:50
  • @Martijn Pieters: but in this case not inlining at all is even faster than list comprehension (because the second loop should not be a loop at all, but list comprehension lack necessary syntaxic power). – kriss Dec 17 '13 at 09:16
  • @kris: indeed; avoiding loops is faster still. Sometimes a one-liner is just not worth it! – Martijn Pieters Dec 17 '13 at 09:24
  • @kriss How can we do this without the second loop? – thefourtheye Dec 17 '13 at 09:25
  • @thefourtheye: the second split should just return key and value, other configurations are errors. We want to strip both, but it's still just two values. But in the list comprehension context we have not much choice but either tricking that to a loop of two (or performing two splits of the same value, which is possibly even worse). – kriss Dec 17 '13 at 09:43
2

Maybe something like:

dict(map(lambda x: x.strip(), item.split("=")) for item in PARAMS.split(";"))

or another even more elegant version:

dict((l[i].strip(), l[i+1].strip()) for i in range(2) for l in [re.split(';|=', PARAMS)])

Of course this is elegant only if you take it as a synonym of obfuscated, but when we are seeking one-liners is it not what we mean ?

To solve this problem I would probably write:

d = dict(); 
for item in PARAMS.split(";"):
    key, value = item.split("=")
    d[key.strip()] = value.strip()

It is both easier to read and faster than all the proposed answer until now, and I didn't even bothered to optimize it in any way, henceforth it is probably not the best possible solution.

Don't believe it on words, time the different solutions to check:

PARAMS = 'TEST = xy; TEST2= klklk '

from timeit import timeit

print 'obfuscated', timeit('dict((l[i].strip(), l[i+1].strip()) for i in range(2) for l in [re.split(";|=", PARAMS)])', "from __main__ import PARAMS; import re")
print 'tuple', timeit('dict((i.strip() for i in item.split("=")) for item in PARAMS.split(";"))', "from __main__ import PARAMS")
print 'regex', timeit('dict(re.findall(r"(\S+)\s*=\s*([^\s;]+)", PARAMS))', "from __main__ import PARAMS; import re")
print 'lambda', timeit('dict(map(lambda x: x.strip(), item.split("=")) for item in PARAMS.split(";"))', "from __main__ import PARAMS; import re")
print 'list comprehension', timeit('dict([i.strip() for i in item.split("=")] for item in PARAMS.split(";"))', "from __main__ import PARAMS")
print 'replace spaces', timeit('dict(item.split("=") for item in PARAMS.replace(" ", "").split(";"))', "from __main__ import PARAMS; import re")

print 'not one line', timeit(
'''
    d = dict(); 
    for item in PARAMS.split(";"):
        key, value = item.split("=")
        d[key.strip()] = value.strip()
    d
''',
"from __main__ import PARAMS")

Below are the timing results:

  • obfuscated: 7.36826086044
  • tuple: 4.49374079704
  • regex: 3.61684799194
  • lambda: 3.51627087593
  • list comprehension: 2.90777206421
  • replace spaces: 2.46001887321
  • not one line: 1.71015286446

It speaks for itself.

PS: the reason why the not one line is faster is probably because it avoids creating an unecessary list structure, but directly store value in the dict. But that was a no brainer, not even voluntary.

kriss
  • 23,497
  • 17
  • 97
  • 116
1

Or, alternatively:

import re
text = 'TEST = xy; TEST2= klklk '
params = dict(re.findall(r'(\S+)\s*=\s*([^\s;]+)', text))
# {'TEST': 'xy', 'TEST2': 'klklk'}
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
1

If none of your keys or values have spaces inside them, then you're free to eliminate all spaces with a single replace method:

>>> dict(item.split("=") for item in PARAMS.replace(" ", "").split(";"))
{'TEST': 'xy', 'TEST2': 'klklk'}

This will eliminate more spaces than strip would, of course:

>>> PARAMS = 'TEST 3 = there should be spaces between these words '
>>> dict(item.split("=") for item in PARAMS.replace(" ", "").split(";"))
{'TEST3': 'thereshouldbespacesbetweenthesewords'}
Kevin
  • 74,910
  • 12
  • 133
  • 166