1

I'm gathering data together in one part of my application and sending it off for work in another part. I've got a few thousand records, each containing an email address and a few ordered integers which represent some preferences.

My first thought was to organize my data in a dictionary like this:

{
    "user1@domain.com": [23, 1, 5],
    "user2@domain.com": [1, 4, 8]
}

But then I was thinking about tuples. I often overlook them, but tuples are a good option in Python, so I could do this:

[
    ("user1@domain.com", [23, 1, 5]),
    ("user2@domain.com", [1, 4, 8])
]

These examples show two records each, but I'll actually have somewhere in the low tens of thousands of records.

Is one of these more pythonic than the other? Is there another way I should consider?

I'm leaning towards the dictionary because when I build the structure I'm picking out ordered records that look like this:

(("user1@domain.com", 23), ("user1@domain.com", 1), ("user1@domain.com", 5), ("user2@domain.com", 1), ("user2@domain.com", 4), ("user2@domain.com", 8))

and combining them into one of the above forms. With the dictionary, it's easy to reference the same user's list over and over. With the list of tuples I guess I'd need to keep a reference to the last element, or keep calling len() on the list.

WhiteHotLoveTiger
  • 2,088
  • 3
  • 30
  • 41
  • 3
    Depends on your use case - for some cases dict is better, for others tuples/lists. – Andrej Kesely Jun 21 '19 at 16:58
  • 2
    If you don't have duplicate email addresses (and the list stored against the keys suggests that the data is aggregated) then the dictionary is the obvious choice. This is _not_ about being pythonic, it's about the time complexity of data retrieval. "I often overlook them, but tuples are a good option in Python" is something of a nothing-statement (I don't mean that offensively) but what does that amount to, really? – roganjosh Jun 21 '19 at 17:05

2 Answers2

2

This is an opinion question, and my opinion is that neither is more pythonic. The way you should structure the data depends on the way you plan to use it.

You mentioned in a comment that when you're using the data you'll just be looping through it, so either format will do. However, for building the structure out of tuples like ("user1@domain.com", 23), a dict (or DefaultDict) will be more convenient:

entries = (("user1@domain.com", 23), ("user1@domain.com", 1), ("user1@domain.com", 5), ("user2@domain.com", 1), ("user2@domain.com", 4), ("user2@domain.com", 8))
result = {}
for email, number in entries:
    result.setdefault(email, []).append(number)

Or to use the standard library:

import itertools
import operator
entries = (("user1@domain.com", 23), ("user1@domain.com", 1), ("user1@domain.com", 5), ("user2@domain.com", 1), ("user2@domain.com", 4), ("user2@domain.com", 8))
result = dict((k, [w[1] for w in v]) for k, v in itertools.groupby(sorted(entries), operator.itemgetter(0)))
Aaron Bentley
  • 1,332
  • 8
  • 14
1

If you want to search through or retrieve from the data based on email address, then having emails as keys in a dictionary will be a faster operation. It will also be faster to add data to a dictionary than a tuple based on the data records you described.

This is because in python, dict are stored as hash tables which makes searching keys O(1) operation whereas for finding the first element of a tuple, you will need to traverse the entire tuple list making it an O(n) operation.

Abhineet Gupta
  • 624
  • 4
  • 12
  • Thanks for mentioning this. I should have clarified that when I'm using the data, I won't care about the order. I won't be looking things up by email address, just looping through, and making use of each record one-by-one. – WhiteHotLoveTiger Jun 21 '19 at 17:13
  • 1
    @WhiteHotLoveTiger the answer doesn't actually mention order, and dictionaries preserve insertion order as of Python 3.6+ anyway. The question is becoming increasingly unclear. – roganjosh Jun 21 '19 at 17:21