0

I find myself repeating the following code (or something similar) often:

users = {}
for d in data:
    if d['user'] in users.keys():
        users[d['user']].append(d)
    else:
        users[d['user']] = [d]

Here, data is a list of dicts, and I want to split the list into smaller lists mapped to their d["user"] value as a key in a dictionary.

I would like a way of doing this in a single line, because these multiple lines annoy me.

The only way I can think of doing this, however, involve changing my O(N) algorithm (above) into an O(N^2) algorithm, like:

users = {d["user"]: [d for d in data if d["user"] == u] for d in data}

Obviously, this inefficiency is unacceptable...

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
MrRedstone
  • 45
  • 1
  • 3
  • You could use a defaultdict instead of a dictionary, will save you the if. Besides that there is no performant (and readable) way to do it. Another way would be to sort and groupby, better than O(n^2) but not O(n) – Dani Mesejo Dec 14 '20 at 15:41

3 Answers3

1

You can use this kind of syntax for tests

[3*n+1 if n%2==1 else n//2 for n in range(100)]

wich fits the kind of needs you have, especially dealing with comprehension lists and all. For your purpose, this should do :

users = {users[d['user']].append(d) if d['user'] in users else users[d['user']] = [d] for d in data}
smed
  • 148
  • 1
  • 10
  • can you check `if d['user'] in users` in a one line dictionary constructer like that? I didn't think you could do that..? – MrRedstone Dec 16 '20 at 17:57
0

This is more or less the same as what you posted in your original comment but made slightly cleaner:

# set up sample data
from random import randint, choice
names = ["Alice", "Bob", "Charlie"]
data = [{"user": choice(names), "value": randint(1, 10)} for _ in range(10)]

# convert data to dict of columns
users = {}
for d in data:
    users.setdefault(d["user"], []).append(d)

If your data is sorted already you could do something like the following

from operator import itemgetter
from itertools import groupby

# assume sorted data
data = sorted(data, key=itemgetter("user"))

{k: list(g) for k, g in itertools.groupby(data, key=itemgetter("user"))}
mrosales
  • 1,543
  • 11
  • 18
0

You could make it a monster one liner, like this:

users = { u:v[u] for v in [dict()] for d in data for u in [d['user']] if not v.setdefault(u,[]).append(d) }

Or reduce it to two lines, like this:

users = dict()
for d in data: users.setdefault(d['user'],[]).append(d)

both will run in O(N) time (but I prefer the 2nd one personally)

The other thing you could do is create a function and use that instead:

def dataToDict(data,key):
    result = dict()
    for d in data: result.setdefault(d[key],[]).append(d)
    return result

users = dataToDict(data,"user")
Alain T.
  • 40,517
  • 4
  • 31
  • 51