Is an intermediate list necessary in a multi-level list comprehension

Question

Here is a specific example:

my_dict={k:int(encoded_value) 
         for (k,encoded_value) in 
             [encoded_key_value.split('=') for encoded_key_value in 
              many_encoded_key_values.split(',')]}

The question is about the internal list [], can it be avoided, e.g.:

# This will not parse
my_dict={k:int(encoded_value) 
         for (k,encoded_value) in 
             encoded_key_value.split('=') for encoded_key_value in 
             many_encoded_key_values.split(',')}

..., which is invalid syntax:

NameError: name 'encoded_key_value' is not defined

Sample data: aa=1,bb=2,cc=3,dd=4,ee=-5

for this particular example maybe even `literal_eval` from `ast` could be helpful with some text manipulations. — Ma0, Aug 10 '17 at 15:13
@Ev.Kounis, I've also tried `result = ast.literal_eval('dict('+many_encoded_key_values+')')`. But I'm curious, it doesn't work: `... raise ValueError('malformed node or string: ' + repr(node))` — RomanPerekhrest, Aug 10 '17 at 15:35
@RomanPerekhrest I tried `res = ast.literal_eval('{"' + many_encoded_key_values.replace('=', '":').replace(',', ',"') + '}')` and it did but it looked too ugly to post. — Ma0, Aug 10 '17 at 15:36
@MichaelGoldshteyn The pain in the neck was quoting the `abc`s. But @Roman has a very valid point.. Why doesn't his `literal_eval` work?. — Ma0, Aug 10 '17 at 15:38

RomanPerekhrest · Answer 1 · 2017-08-10T16:25:50.660

5

As was mentioned, generator expression will enhance your approach avoiding creating inner list. But there is a shorter way to obtain the needed result, using re.findall() function:

result = {k:int(v) for k,v in re.findall(r'(\w+)=([^,]+)', many_encoded_key_values)}
print(result)

The output:

{'dd': 4, 'aa': 1, 'bb': 2, 'ee': -5, 'cc': 3}

The alternative approach would be using re.finditer() function which returns 'callable_iterator' instance:

result = {m.group(1):int(m.group(2)) for m in re.finditer(r'(\w+)=([^,]+)', many_encoded_key_values)}

edited Aug 10 '17 at 16:25

answered Aug 10 '17 at 15:17

RomanPerekhrest

88,541
4
65
105

An interesting approach using a regex to, logically at least, "strength reduce" the expression to one level. – Michael Goldshteyn Aug 10 '17 at 15:29
@MichaelGoldshteyn, yes. It would be even easier if we have ampersand separated items `many_encoded_key_values = 'aa=1,7&bb=2&cc=3&dd=4&ee=-5'` – RomanPerekhrest Aug 10 '17 at 15:41
2

`findall()` **does** built a `list` though. – Ma0 Aug 10 '17 at 15:41
@Ev.Kounis, it does, but it also avoids all split operations (besides, my approach idea is **a shorter way**) – RomanPerekhrest Aug 10 '17 at 15:42
@MichaelGoldshteyn there is [`finditer`](https://docs.python.org/3/library/re.html?#re.finditer) which is a generator. – hiro protagonist Aug 10 '17 at 16:14
@MichaelGoldshteyn, see my update, alternative approach – RomanPerekhrest Aug 10 '17 at 16:26

hiro protagonist · Accepted Answer · 2017-08-10T15:42:37.967

3

you could avoid creating an intermediate list by using an intermediate generator expression:

my_dict={k:int(encoded_value)
         for (k,encoded_value) in
             (encoded_key_value.split('=') for encoded_key_value in
              many_encoded_key_values.split(','))}

syntax-wise this is almost the same; instead of generating an intermediate list first and then using the elements, the elements are consumed on the fly.

making this overly verbose you could use a 'data pipeline' that consist of generators:

eq_statements = (item.strip() for item in many_encoded_key_values.split(','))
var_i = (var_i.split('=') for var_i in eq_statements)
my_dict = {var: int(i) for var, i in var_i}
print(my_dict)

(unfortunately .split does not return a generator so considering saving space this is not of much use... for handling large files things like this may come in handy.)

found this answer which has split as an iterator. just in case...

edited Aug 10 '17 at 15:42

answered Aug 10 '17 at 15:07

hiro protagonist

44,693
14
86
111

Interesting, this avoid temporary storage and iterates the items instead. – Michael Goldshteyn Aug 10 '17 at 15:14
yes, added an overly verbose variant that should illustrate the (lazy) generator evaluation of it... – hiro protagonist Aug 10 '17 at 15:21
thanks, this looks like a good approach for n-level expressions to make the code more readable. – Michael Goldshteyn Aug 10 '17 at 15:30

PM 2Ring · Answer 3 · 2017-08-10T15:41:06.543

1

FWIW, here's a functional approach:

def convert(s):
    k, v = s.split('=')
    return k, int(v)

d = dict(map(convert, data.split(',')))
print(d)

output

{'aa': '1', 'bb': '2', 'cc': '3', 'dd': '4', 'ee': '-5'}

edited Aug 10 '17 at 15:41

answered Aug 10 '17 at 15:25

PM 2Ring

54,345
6
82
182

I want integers for the values, though. Please update your answer. – Michael Goldshteyn Aug 10 '17 at 15:26
@MichaelGoldshteyn Ah, ok. – PM 2Ring Aug 10 '17 at 15:27
@MichaelGoldshteyn In that case, hiro protagonist's 1st method is probably the best way. However, I've added a functional version which avoids an intermediate list. In Python 2, `map` does build a list, but not in Python 3. – PM 2Ring Aug 10 '17 at 15:42

hiro protagonist · Answer 4 · 2017-08-10T19:58:58.080

0

a simple and compact variant that is very close to your original attempt:

d = {v.strip(): int(i) for s in data.split(',') for v, i in (s.split('='),)}

the only additional 'trick' was to wrap s.split('=') inside a tuple (surrounding it with parentheses: (s.split('='),)) in order to get both elements of split in the same for iteration. the rest is straightforward.

edited Aug 10 '17 at 19:58

answered Aug 10 '17 at 18:40

hiro protagonist

44,693
14
86
111

...sorry for the additional answer(s). but i felt like this should be possible in a simpler way that what i presented first. this feels much more natural to me. – hiro protagonist Aug 10 '17 at 18:45

Is an intermediate list necessary in a multi-level list comprehension

4 Answers4