0

I'm having trouble understanding why the two following codebits end up behaving wildly differently, since I didn't expect that at all.

Regardless, at start I have a pandas dataframe like:

import pandas as pd
products = pd.DataFrame(columns=['category','product_id'])
products['category'] = pd.Series(['a','a','a','b','b'])
products['product_id'] = pd.Series(['1','2','3','4','5'])

Method 1

The following code displays the behaviour I want:

# only keys assigned at declaration, no values:
categories = dict.fromkeys(products['category'].unique())

for categ in categories.keys():

    # each value is itself a dictionary
    categories[categ] = {}

    #  portion of products df
    categ_products = products.loc[products['category'] == categ].copy()

    # assign the 2nd level dict
    for i,key in enumerate(categ_products['product_id']):
        categories[categ][key] = categ + str(key)

by which I mean, each categories[categ] is a dictionary with its own keys, the products specific to the category:

categories
>>> {'a': {'1': 'a1', '2': 'a2', '3': 'a3'}, 'b': {'4': 'b4', '5': 'b5'}}

Method 2

If I instead declare the top dictionary as containing dictionaries already outside the loop, i.e.:

# already assigned as a nested dictionary at declaration:
categories = dict.fromkeys(products['category'].unique(), {})

for categ in categories.keys():

    # portion of products df
    categ_products = products.loc[products['category'] == categ].copy()
    # assign the 2nd level dicts
    for i,key in enumerate(categ_products['product_id']):
        categories[categ][key] = categ + str(key)

I end up with each categories[categ] containing all the keys, from all category-specific lists!

categories
{'a': {'1': 'a1', '2': 'a2', '3': 'a3', '4': 'b4', '5': 'b5'},
 'b': {'1': 'a1', '2': 'a2', '3': 'a3', '4': 'b4', '5': 'b5'}}

Basically, it's as if the line starting in categories[categ][key] = ... assigned values for every value of categ, instead of only the current one during the for loop.

I suppose this is a feature and not a bug, but I just don't understand it. Why do the two snippets behave differently?

kilgoretrout
  • 158
  • 3
  • 14
  • 1
    What is `products`? Please mock up a simple but representative example to make this a [mcve]. – John Coleman Nov 27 '19 at 14:07
  • The problem is that you are giving multiple references to a *single* dictionairy in the line `dict.fromkeys(products['category'].unique(), {})`. See [this question](https://stackoverflow.com/q/1132941/4996248) for a discussion. In fact, your question is in some ways a duplicate. – John Coleman Nov 27 '19 at 14:14
  • sorry for the lazy posting, I made the code reproducible. Mind elaborating further? I don't fully understand what you mean by "giving multiple references to a single dictionairy" – kilgoretrout Nov 27 '19 at 14:54
  • I'm sure this is a duplicate of some other question, but I could hardly guess of which one, since I simply don't understand what behaviour am I looking at here. – kilgoretrout Nov 27 '19 at 15:00
  • With method 1, there are two dictionaries which appear as values. With method 2, there is a single shared dictionary, which is the one dictionary `{}` which appears in `dict.fromkeys(products['category'].unique(), {})`. There can be multiple references (which function as aliases) to a single mutable object. That is exactly what you are seeing here. – John Coleman Nov 27 '19 at 15:07
  • aaaaaah dauymn! I thought it was assigning as many empty dictionaries as there were keys! Well, if you wanna post this explanation as an answer I'll be happy to accept it :) – kilgoretrout Nov 27 '19 at 15:18
  • 1
    It is almost certainly a duplicate so I won't post it as an asnwer. If you want a workaround use a dictionary comprehension: `{k:{} for k in products['category'].unique()}` does what you want. – John Coleman Nov 27 '19 at 15:22

0 Answers0