I'm having trouble understanding why the two following codebits end up behaving wildly differently, since I didn't expect that at all.
Regardless, at start I have a pandas dataframe like:
import pandas as pd
products = pd.DataFrame(columns=['category','product_id'])
products['category'] = pd.Series(['a','a','a','b','b'])
products['product_id'] = pd.Series(['1','2','3','4','5'])
Method 1
The following code displays the behaviour I want:
# only keys assigned at declaration, no values:
categories = dict.fromkeys(products['category'].unique())
for categ in categories.keys():
# each value is itself a dictionary
categories[categ] = {}
# portion of products df
categ_products = products.loc[products['category'] == categ].copy()
# assign the 2nd level dict
for i,key in enumerate(categ_products['product_id']):
categories[categ][key] = categ + str(key)
by which I mean, each categories[categ]
is a dictionary with its own keys, the products specific to the category:
categories
>>> {'a': {'1': 'a1', '2': 'a2', '3': 'a3'}, 'b': {'4': 'b4', '5': 'b5'}}
Method 2
If I instead declare the top dictionary as containing dictionaries already outside the loop, i.e.:
# already assigned as a nested dictionary at declaration:
categories = dict.fromkeys(products['category'].unique(), {})
for categ in categories.keys():
# portion of products df
categ_products = products.loc[products['category'] == categ].copy()
# assign the 2nd level dicts
for i,key in enumerate(categ_products['product_id']):
categories[categ][key] = categ + str(key)
I end up with each categories[categ]
containing all the keys, from all category-specific lists!
categories
{'a': {'1': 'a1', '2': 'a2', '3': 'a3', '4': 'b4', '5': 'b5'},
'b': {'1': 'a1', '2': 'a2', '3': 'a3', '4': 'b4', '5': 'b5'}}
Basically, it's as if the line starting in categories[categ][key] = ...
assigned values for every value of categ
, instead of only the current one during the for loop.
I suppose this is a feature and not a bug, but I just don't understand it. Why do the two snippets behave differently?