172

I am new to python and I have a list of years and values for each year. What I want to do is check if the year already exists in a dictionary and if it does, append the value to that list of values for the specific key.

So for instance, I have a list of years and have one value for each year:

2010  
2  
2009  
4  
1989  
8  
2009  
7  

What I want to do is populate a dictionary with the years as keys and those single digit numbers as values. However, if I have 2009 listed twice, I want to append that second value to my list of values in that dictionary, so I want:

2010: 2  
2009: 4, 7  
1989: 8  

Right now I have the following:

d = dict()  
years = []  

(get 2 column list of years and values)

for line in list:    
    year = line[0]   
    value = line[1]  

for line in list:  
    if year in d.keys():  
        d[value].append(value)  
    else:  
        d[value] = value  
        d[year] = year  
martineau
  • 119,623
  • 25
  • 170
  • 301
anon
  • 1,733
  • 2
  • 11
  • 6
  • 1
    Another similar question: http://stackoverflow.com/questions/5378231/python-list-to-dictionary-multiple-values-per-key – River Sep 17 '15 at 00:17

7 Answers7

240

If I can rephrase your question, what you want is a dictionary with the years as keys and an array for each year containing a list of values associated with that year, right? Here's how I'd do it:

years_dict = dict()

for line in list:
    if line[0] in years_dict:
        # append the new number to the existing array at this slot
        years_dict[line[0]].append(line[1])
    else:
        # create a new array in this slot
        years_dict[line[0]] = [line[1]]

What you should end up with in years_dict is a dictionary that looks like the following:

{
    "2010": [2],
    "2009": [4,7],
    "1989": [8]
}

In general, it's poor programming practice to create "parallel arrays", where items are implicitly associated with each other by having the same index rather than being proper children of a container that encompasses them both.

Faisal
  • 4,687
  • 1
  • 19
  • 13
  • 20
    This is definitely the right way to do it, although not necessarily the most concise given the availability of cool tricks like `dict.setdefault()` and `collections.defaultdict` as part of the default toolset available to modern Python installations. – jathanism Jul 07 '10 at 22:10
  • 2
    If you do use defaultdict set it up as a list: dd = defaultdict(list) – sparrow Aug 03 '16 at 20:14
  • 1
    this approach is very underperformant compared fo other methods described in the other answers. – Jean-François Fabre Mar 06 '18 at 08:04
124

You would be best off using collections.defaultdict (added in Python 2.5). This allows you to specify the default object type of a missing key (such as a list).

So instead of creating a key if it doesn't exist first and then appending to the value of the key, you cut out the middle-man and just directly append to non-existing keys to get the desired result.

A quick example using your data:

>>> from collections import defaultdict
>>> data = [(2010, 2), (2009, 4), (1989, 8), (2009, 7)]
>>> d = defaultdict(list)
>>> d
defaultdict(<type 'list'>, {})
>>> for year, month in data:
...     d[year].append(month)
... 
>>> d
defaultdict(<type 'list'>, {2009: [4, 7], 2010: [2], 1989: [8]})

This way you don't have to worry about whether you've seen a digit associated with a year or not. You just append and forget, knowing that a missing key will always be a list. If a key already exists, then it will just be appended to.

jathanism
  • 33,067
  • 9
  • 68
  • 86
57

You can use setdefault.

for line in list:  
    d.setdefault(year, []).append(value)

This works because setdefault returns the list as well as setting it on the dictionary, and because a list is mutable, appending to the version returned by setdefault is the same as appending it to the version inside the dictionary itself. If that makes any sense.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
27
d = {} 

# import list of year,value pairs

for year,value in mylist:
    try:
        d[year].append(value)
    except KeyError:
        d[year] = [value]

The Python way - it is easier to receive forgiveness than ask permission!

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
19

Here is an alternative way of doing this using the not in operator:

# define an empty dict
years_dict = dict()

for line in list:
    # here define what key is, for example,
    key = line[0]
    # check if key is already present in dict
    if key not in years_dict:
        years_dict[key] = []
    # append some value 
    years_dict[key].append(some.value)
USER_1
  • 2,409
  • 1
  • 28
  • 28
  • Very sweet not in usage. you have to love python. i love this technique, as in my use case this offers me more granular key value management with appending vs list zipping – Lenn Dolling Sep 10 '22 at 18:06
7

It's easier if you get these values into a list of tuples. To do this, you can use list slicing and the zip function.

data_in = [2010,2,2009,4,1989,8,2009,7]
data_pairs = zip(data_in[::2],data_in[1::2])

Zip takes an arbitrary number of lists, in this case the even and odd entries of data_in, and puts them together into a tuple.

Now we can use the setdefault method.

data_dict = {}
for x in data_pairs:
    data_dict.setdefault(x[0],[]).append(x[1])

setdefault takes a key and a default value, and returns either associated value, or if there is no current value, the default value. In this case, we will either get an empty or populated list, which we then append the current value to.

erik
  • 1,073
  • 11
  • 13
3

If you want a (almost) one-liner:

from collections import deque

d = {}
deque((d.setdefault(year, []).append(value) for year, value in source_of_data), maxlen=0)

Using dict.setdefault, you can encapsulate the idea of "check if the key already exists and make a new list if not" into a single call. This allows you to write a generator expression which is consumed by deque as efficiently as possible since the queue length is set to zero. The deque will be discarded immediately and the result will be in d.

This is something I just did for fun. I don't recommend using it. There is a time and a place to consume arbitrary iterables through a deque, and this is definitely not it.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • If I use `data = [(2010, 2), (2009, 4), (1989, 8), (2009, 7)]`, it returns `deque([])`. – Cleb Nov 25 '17 at 13:48
  • @Cleb. The result is in `d`. The deque should be discarded. It's only function is to process the generator as quickly as possible. – Mad Physicist Nov 25 '17 at 14:22
  • Ooops, stupid me; then it works actually quite nicely... – Cleb Nov 25 '17 at 14:28
  • 1
    @Cleb. I added a clarifying sentence. It's not that intuitive to create an object just to throw it away. I wonder if you could use the `__init__` method directly. Something like `deque.__init__(None, iterable, maxlen=0)`. – Mad Physicist Nov 25 '17 at 14:40
  • @Cleb. Turns out you can't forego the deque object: `TypeError: descriptor '__init__' requires a 'collections.deque' object but received a 'NoneType'` – Mad Physicist Nov 25 '17 at 16:11