python - Update default values in dictionary from dataframe rows

Question

From a dataframe, I build a dictionary that has as keys each distinct value from a given column. The value of each key is a nested dictionary, being the key the distinct values from another column. The Values in the nested dictionary will be updated by iterating a dataframe (third column).

Example:

import pandas as pd
data = [['computer',1, 10]
       ,['computer',2,20]
       ,['computer',4, 40]
       ,['laptop',1, 100]
       ,['laptop',3, 30]
       ,['printer',2, 200]
       ]
df = pd.DataFrame(data,columns=['Product','id', 'qtt'])
print (df)

Product	id	qtt
computer	1	10
computer	2	20
computer	4	40
laptop	1	100
laptop	3	30
printer	2	200

kdf_key_dic = {key: None for key in df['id'].unique().tolist()}

product_key_dic = {key: kdf_key_dic for key in df['Product'].unique().tolist()}

print ("product_key_dic: ", product_key_dic)

  product_key_dic:  {
   'computer': {1: None, 2: None, 4: None, 3: None}, 
   'laptop': {1: None, 2: None, 4: None, 3: None}, 
   'printer': {1: None, 2: None, 4: None, 3: None}
}

Now I'd like to update the product_key_dic dictionary, but I can't get it right, it always uses the same key-dict for each key in the main dictionary!

for index, row in df.iterrows():
  product_key_dic[row['Product']].update({row['id']:row['qtt']})

print("\n product_key_dic:\n", product_key_dic)

I get:

 product_key_dic:
 { 'computer': {1: 100, 2: 200, 4: 40, 3: 30}, 
   'laptop': {1: 100, 2: 200, 4: 40, 3: 30}, 
   'printer': {1: 100, 2: 200, 4: 40, 3: 30}
 }

I expect:

 { 'computer': {1: 10, 2: 20, 4: 40, 3: None}, 
   'laptop':  {1: 100, 2: None, 4: None, 3: 30}, 
   'printer': {1: None, 2: 200, 4: None, 3: None}
 }

I can't understand the problem, somehow it's like each key has the nested dictoinary..?

Abdul Niyas P M · Answer 1 · 2021-09-04T01:35:53.023

This is because you are reusing same dict object. Let's take these two statements.

kdf_key_dic = {key: None for key in df['id'].unique().tolist()}
product_key_dic = {key: kdf_key_dic for key in df['Product'].unique().tolist()}

You are passing kdf_key_dic as value(in the second statement) which is same object in each iteration.

So instead of this you can pass a copy of kdf_key_dic while constructing product_key_dic

product_key_dic = {key: kdf_key_dic.copy() for key in df['Product'].unique().tolist()}

Henry Ecker · Accepted Answer · 2021-09-04T01:42:53.607

We can try a different approach creating a MultiIndex.from_product based on the unique values from Product and Id then reshaping so we can call DataFrame.to_dict directly:

cols = ['Product', 'id']

product_key_dic = (
    df.set_index(cols).reindex(
        pd.MultiIndex.from_product(
            [df[col].unique() for col in cols],
            names=cols
        )
    )  # Reindex to ensure all pairs are present in the DF
        .replace({np.nan: None})  # Replace nan with None
        .unstack('Product')  # Create Columns from Product
        .droplevel(0, axis=1)  # Remove qtt from column MultiIndex
        .to_dict()
)

product_key_dic:

{
    'computer': {1: 10.0, 2: 20.0, 3: None, 4: 40.0},
    'laptop': {1: 100.0, 2: None, 3: 30.0, 4: None},
    'printer': {1: None, 2: 200.0, 3: None, 4: None}
}

Methods Used:

Setup and imports:

import numpy as np
import pandas as pd

data = [['computer', 1, 10], ['computer', 2, 20], ['computer', 4, 40],
        ['laptop', 1, 100], ['laptop', 3, 30], ['printer', 2, 200]]
df = pd.DataFrame(data, columns=['Product', 'id', 'qtt'])

The initial solution could be modified by adding a copy call to the dictionary in the comprehension to make them separate dictionaries rather than multiple references to the same one (How to copy a dictionary and only edit the copy). However, iterating over DataFrames is discouraged (Does pandas iterrows have performance issues?):

kdf_key_dic = {key: None for key in df['id'].unique().tolist()}

product_key_dic = {key: kdf_key_dic.copy()
                   for key in df['Product'].unique().tolist()}

for index, row in df.iterrows():
    product_key_dic[row['Product']].update({row['id']: row['qtt']})

product_key_dic:
{
    'computer': {1: 10.0, 2: 20.0, 3: None, 4: 40.0},
    'laptop': {1: 100.0, 2: None, 3: 30.0, 4: None},
    'printer': {1: None, 2: 200.0, 3: None, 4: None}
}

python - Update default values in dictionary from dataframe rows

2 Answers2