29

I have a list of dicts like this:

l = [{'name': 'foo', 'values': [1,2,3,4]}, {'name': 'bar', 'values': [5,6,7,8]}]

and I would like to obtain an output of this form:

>>> [('foo', 'bar'), ([1,2,3,4], [5,6,7,8])]

But short of for-looping and appending I don't see a solution. Is there a smarter way than doing this?

names = []
values = []
for d in l:
    names.append(d['name'])
    values.append(d['values'])
ChaosPredictor
  • 3,777
  • 1
  • 36
  • 46
oarfish
  • 4,116
  • 4
  • 37
  • 66
  • 13
    Keep in mind that your solution is probably the best there is performance-wise. All single-line comprehensions will probably need 2 full iterations over the list. Your code does it with one. Shorter code is not always the most Pythonic. – DeepSpace Oct 29 '18 at 14:00
  • I agree, I'm just curious whether this _can_ be written in one line. – oarfish Oct 29 '18 at 14:06
  • 5
    Your solution is better than all the other ones down there. – Aran-Fey Oct 29 '18 at 17:57
  • 1
    @DeepSpace The list comprehension method actually seems to be the fastest (about 3 times faster than OP.) Here's a script for benchmarking three of the solutions presented below: https://repl.it/@cchudzicki/Looping-Efficiency For loops, it matters not just "how many loops" but also how many operations you do during each iteration. BTW: As always, don't optimize for efficiency prematurely. But I wouldn't go out of my way to avoid a generator expression in the name of efficiency when it is ... more efficient. – Chris Chudzicki Oct 29 '18 at 22:27
  • 1
    @ChrisChudzicki: While the list comprehension method *is* faster, it's not *that* much faster: You slowed down the original code considerably by adding the result list, and thus two extra lookups in the loop. (The original could be made faster by caching the `list.append` lookups, but the comprehensions would still be 20-30% faster) As for the `map(dict.values, l)` version, it shouldn't even be considered, since it relies on ≥Python3.7 for guaranteed dictionary ordering *and* all the input dictionaries having been created in the same order! Guaranteed dictionary order by default was a mistake. – Aleksi Torhamo Oct 30 '18 at 02:02
  • 1
    This is the easiest to read and understand (For me, anyway). – DrMcCleod Oct 30 '18 at 09:35
  • @AleksiTorhamo Good point...I should have just left it at "noticeably faster". And thanks for the more detailed explanation below eyllanesc's answer. Anyway, the point still remains not to discount the list comprehension based on speed. – Chris Chudzicki Oct 30 '18 at 10:42
  • @ChrisChudzicki: Yeah, definitely; The double list comprehensions are clearly the winner here on all counts. – Aleksi Torhamo Oct 31 '18 at 05:43
  • Did any of the answers solve you problem? Please accept the most appropriate one if so. – jpmc26 Nov 19 '18 at 17:22
  • The reason I haven't accepted one is that I don't know which one is the best. – oarfish Nov 20 '18 at 06:40

9 Answers9

35

Use generator expression:

l = [{'name': 'foo', 'values': [1,2,3,4]}, {'name': 'bar', 'values': [5,6,7,8]}]
v = [tuple(k["name"] for k in l), tuple(k["values"] for k in l)]
print(v)

Output:

[('foo', 'bar'), ([1, 2, 3, 4], [5, 6, 7, 8])]
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • 9
    It may be worth noting that this solution (and quite possibly every one-liner) requires 2 iterations over the entire list (it may not be obvious at a quick glance) – DeepSpace Oct 29 '18 at 14:03
  • 17
    @DeepSpace I think that visible to the naked eye, there are 2 for loop. :-) – eyllanesc Oct 29 '18 at 14:04
  • @Aran-Fey `k["values"] for k in l`?? – eyllanesc Oct 29 '18 at 17:59
  • 2
    That's a generator expression. – Aran-Fey Oct 29 '18 at 18:01
  • @Aran-Fey mmm, interesting, I think that list comprehension has the following form: `[ expression for item in list if conditional ]`, – eyllanesc Oct 29 '18 at 18:03
  • 2
    Yes, if it's enclosed in square brackets then it's a list comprehension. List comprehension -> returns a list. Dict comprehension -> returns a dict. Set comprehension -> returns a set. Generator expression -> returns a generator. – Aran-Fey Oct 29 '18 at 18:08
  • In this case, the generator produced by each generator expression is consumed by `tuple`; the resulting *tuples* are the values used by the list *literal* to produce `v`. – chepner Oct 29 '18 at 18:09
  • @Aran-Fey and chepner: Interesting, I had only noticed the technique and not the nomenclature, thank you, something new is learned every day. – eyllanesc Oct 29 '18 at 18:11
  • 2
    @DeepSpace On a list with `5e6` entries, this method appears to be about twice as fast as a simple for loop. – Chris Chudzicki Oct 29 '18 at 21:05
  • @ChrisChudzicki Are you saying that 2 loops are faster than one? allow me to doubt that. – DeepSpace Oct 29 '18 at 21:17
  • 2
    @DeepSpace: It's true (at least mostly - in my tests it's less than 2x faster); a normal loop has to lookup the name of the list, then the append method, and then call the function - in a list comprehension, the append is a single opcode. (If you cache `names_append = names.append`, the loop will still have to do a single name lookup and the function call - in that case, in my tests, the optimized loop will slightly outperform this double `tuple()` answer, but will still be 10-30% slower than two list comprehensions) – Aleksi Torhamo Oct 30 '18 at 02:28
25

I would use a list comprehension (much like eyllanesc's) if I was writing this code for public consumption. But just for fun, here's a one-liner that doesn't use any fors.

>>> l = [{'name': 'foo', 'values': [1,2,3,4]}, {'name': 'bar', 'values': [5,6,7,8]}]
>>> list(zip(*map(dict.values, l)))
[('foo', 'bar'), ([1, 2, 3, 4], [5, 6, 7, 8])]

(Note that this only reliably works if dictionaries preserve insertion order, which is not the case in all versions of Python. CPython 3.6 does it as an implementation detail, but it is only guaranteed behavior as of 3.7.)

Quick breakdown of the process:

  • dict.values returns a dict_values object, which is an iterable containing all the values of the dict.
  • map takes each dictionary in l and calls dict.values on it, returning an iterable of dict_values objects.
  • zip(*thing) is a classic "transposition" recipe, which takes an iterable-of-iterables and effectively flips it diagonally. E.g. [[a,b],[c,d]] becomes [[a,c], [b,d]]. This puts all the names into one tuple, and all the values into another.
  • list converts the zip object into a list.
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • 1
    Nice! one-liner with a single iteration over the input list + a constant-length iteration. – DeepSpace Oct 29 '18 at 14:06
  • This was about what I had in mind, but I wouldn't want to rely on the order of dict items. – oarfish Oct 29 '18 at 14:07
  • @oarfish From Python 3.7 dictionaries are ordered, so you shouldn't really have a problem. – see Oct 29 '18 at 14:42
  • 5
    `map` tends to be [considered](https://www.python.org/dev/peps/pep-0279/) [un-pythonic](https://stackoverflow.com/a/10973817), which is what was asked for in the title. – Matt Copperwaite Oct 29 '18 at 16:28
  • In addition to requiring at least Python 3.7, this requires one to make sure that all the dictionaries in the input list have been created in the correct order; in other words, this is extremely fragile code. (Guaranteed ordering of dictionaries by default was a mistake) In addition, this is slower than both the original code and dual list comprehensions. – Aleksi Torhamo Oct 30 '18 at 02:10
10

You can use operator.itemgetter to guarantee ordering of values:

from operator import itemgetter

fields = ('name', 'values')
res = list(zip(*map(itemgetter(*fields), L)))

print(res)

[('foo', 'bar'), ([1, 2, 3, 4], [5, 6, 7, 8])]

If, assuming Python 3.6+, you cannot guarantee appropriate insertion-ordering of dictionaries within your input list, you will need to explicitly define an order as above.

Performance

While a list of "tuple comprehensions" works, it becomes unreadable and inefficient when querying more than a couple of fields:

from operator import itemgetter

n = 10**6
L = [{'name': 'foo', 'values': [1,2,3,4], 'name2': 'zoo', 'name3': 'xyz',
      'name4': 'def'}, {'name': 'bar', 'values': [5,6,7,8], 'name2': 'bart',
      'name3': 'abc', 'name4': 'ghi'}] * n

%timeit [tuple(k["name"] for k in L), tuple(k["values"] for k in L),\
         tuple(k["name2"] for k in L), tuple(k["name3"] for k in L),
         tuple(k["name4"] for k in L)]

%timeit fields = ('name', 'values', 'name2', 'name3' ,'name4');\
        list(zip(*map(itemgetter(*fields), L)))

1 loop, best of 3: 1.25 s per loop
1 loop, best of 3: 1.04 s per loop
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 2
    I had no idea `itemgetter` could fetch multiple values at once. Excellent answer! Assuming that the list of fields is not needed later, wouldn't `map(itemgetter('name', 'values'), L))` work just as well? – jpmc26 Oct 29 '18 at 21:31
  • @jpmc26, Yep, of course `itemgetter('names', 'values')` works just as well, `*` operator just unpacks. I just felt separating in 2 lines aids readability here. – jpp Oct 29 '18 at 22:37
5

This may not be exactly what you had in mind, but for tabular data like this I find that pandas is usually the best solution in the long run:

>>> import pandas as pd
>>> l = [{'name': 'foo', 'values': [1,2,3,4]}, {'name': 'bar', 'values': [5,6,7,8]}]
>>> df = pd.DataFrame(l)
  name        values
0  foo  [1, 2, 3, 4]
1  bar  [5, 6, 7, 8]

Usually you use the data frame directly for anything you would need to do, but you can also convert it to a list-based data structure:

>>> df['name'].tolist(), df['values'].tolist()
(['foo', 'bar'], [[1, 2, 3, 4], [5, 6, 7, 8]]) 
Kale Kundert
  • 1,144
  • 6
  • 18
4

Not sure about performance, but here's another take using zip() and unpacking:

list(zip(*[tuple(i.values()) for i in l]))

# [('foo', 'bar'), ([1, 2, 3, 4], [5, 6, 7, 8])]

Edit: As @DeepSpace pointed out, it can be further reduced down to:

list(zip(*(i.values() for i in l)))

Here's a longer, but more explicit answer if you want to define the orders yourself:

list(zip(*(tuple(map(lambda k: i.get(k), ('name', 'values'))) for i in l)))

# [('foo', 'bar'), ([1, 2, 3, 4], [5, 6, 7, 8])]
r.ook
  • 13,466
  • 2
  • 22
  • 39
  • Nice. This can be a little bit more memory friendly: `list(zip(*(i.values() for i in l)))` – DeepSpace Oct 29 '18 at 14:09
  • 2
    It seems that this does basically the same as @Kevin's solution with a listcomp instead of a map, correct? – oarfish Oct 29 '18 at 14:09
  • @DeepSpace Great point, I had the comprehension wrapped as a `list` before and it returned the `dict_view` so I forced it into a tuple. Nice catch! @oarfish, I just saw the other answer after I posted :( testing takes time. I do concede a `zip/map` combo is more compact. – r.ook Oct 29 '18 at 14:11
  • @oarfish In response to your comment for Kevin's answer, I've updated my answer to include a variant where you can define an explicit order without relying on the dictionary's order (to make up for the fact that our answers are so similar). – r.ook Oct 29 '18 at 14:33
3

use map for this

names = tuple(map(lambda d: d['name'], l))
values = tuple(map(lambda d: d['values'], l))
result = [names, values]
user3142459
  • 630
  • 5
  • 18
0

First : your code is fine, readable and efficient, which sounds Pythonic to me. Note that you probably don't want a list of tuples, though. Tuples are immutable, so you wouldn't be able to append another name to names.

With a single dict

If names are unique, you could convert your list of dicts to a large dict:

>>> l = [{'name': 'foo', 'values': [1,2,3,4]}, {'name': 'bar', 'values': [5,6,7,8]}]
>>> data = {d['name']:d['values'] for d in l}
>>> data
{'foo': [1, 2, 3, 4], 'bar': [5, 6, 7, 8]}

You can get the desired information directly:

>>> data.keys()
dict_keys(['foo', 'bar'])
>>> data.values()
dict_values([[1, 2, 3, 4], [5, 6, 7, 8]])

If you really want a list of lists:

>>> [list(data.keys()), list(data.values())]
[['foo', 'bar'], [[1, 2, 3, 4], [5, 6, 7, 8]]]

With pandas

If you're working with a large list of dicts, you might want to consider pandas.

You could initialize a DataFrame directly:

>>> import pandas as pd
>>> df = pd.DataFrame([{'name': 'foo', 'values': [1,2,3,4]}, {'name': 'bar', 'values': [5,6,7,8]}])
>>> df
  name        values
0  foo  [1, 2, 3, 4]
1  bar  [5, 6, 7, 8]

If you need the names as an iterable, you can get the corresponding column:

>>> df['name']
0    foo
1    bar
Name: name, dtype: object

If you really need a list of names:

>>> list(df['name'])
['foo', 'bar']

To get the names and values together:

>>> df.values.T
array([['foo', 'bar'],
       [list([1, 2, 3, 4]), list([5, 6, 7, 8])]], dtype=object)
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • The single dict answer works only if dictionaries have a deterministic order of iteration, doesn't it? – oarfish Oct 30 '18 at 06:24
  • @oarfish: Hello! Thanks for the comment. The dict order will be the same as the original data in [Python 3.6 or higher](https://stackoverflow.com/a/39980744/6419007). In older Python version, the returned objects might not have the same order as the original dict but `keys` and `values` will have the corresponding elements in the same order. – Eric Duminil Oct 30 '18 at 11:29
0

Here's a recursive way of doing it:

def trans(l):
  if l:
    res = trans(l[1:])
    res[0], res[1] = (l[0]['name'],) + res[0], (l[0]['values'],) + res[1]
    return res
  return [(),()]
greenBox
  • 552
  • 4
  • 5
-2

Just like this:

(lambda f:
    lambda l, r=[(), ()]: f(f, l, r)
)(lambda g, l, r:
    r if len(l) == 0  else g(g, l[1:], [r[0]+(l[0]['name'],), r[1]+(l[0]['values'],)])
)([
    {'name': 'foo', 'values': [1, 2, 3, 4]},
    {'name': 'bar', 'values': [5, 6, 7, 8]},
    {'name': 'baz', 'values': [9, 9, 9, 9]}
])

Result:

[('foo', 'bar', 'baz'), ([1, 2, 3, 4], [5, 6, 7, 8], [9, 9, 9, 9])]
gooooof
  • 31
  • 1