1

From a dataframe, I build a dictionary that has as keys each distinct value from a given column. The value of each key is a nested dictionary, being the key the distinct values from another column. The Values in the nested dictionary will be updated by iterating a dataframe (third column).

Example:

import pandas as pd
data = [['computer',1, 10]
       ,['computer',2,20]
       ,['computer',4, 40]
       ,['laptop',1, 100]
       ,['laptop',3, 30]
       ,['printer',2, 200]
       ]
df = pd.DataFrame(data,columns=['Product','id', 'qtt'])
print (df)
Product id qtt
computer 1 10
computer 2 20
computer 4 40
laptop 1 100
laptop 3 30
printer 2 200
kdf_key_dic = {key: None for key in df['id'].unique().tolist()}

product_key_dic = {key: kdf_key_dic for key in df['Product'].unique().tolist()}

print ("product_key_dic: ", product_key_dic)
  product_key_dic:  {
   'computer': {1: None, 2: None, 4: None, 3: None}, 
   'laptop': {1: None, 2: None, 4: None, 3: None}, 
   'printer': {1: None, 2: None, 4: None, 3: None}
}

Now I'd like to update the product_key_dic dictionary, but I can't get it right, it always uses the same key-dict for each key in the main dictionary!

for index, row in df.iterrows():
  product_key_dic[row['Product']].update({row['id']:row['qtt']})

print("\n product_key_dic:\n", product_key_dic)

I get:

 product_key_dic:
 { 'computer': {1: 100, 2: 200, 4: 40, 3: 30}, 
   'laptop': {1: 100, 2: 200, 4: 40, 3: 30}, 
   'printer': {1: 100, 2: 200, 4: 40, 3: 30}
 }

I expect:

 { 'computer': {1: 10, 2: 20, 4: 40, 3: None}, 
   'laptop':  {1: 100, 2: None, 4: None, 3: 30}, 
   'printer': {1: None, 2: 200, 4: None, 3: None}
 }

I can't understand the problem, somehow it's like each key has the nested dictoinary..?

Alg_D
  • 2,242
  • 6
  • 31
  • 63

2 Answers2

1

This is because you are reusing same dict object. Let's take these two statements.

kdf_key_dic = {key: None for key in df['id'].unique().tolist()}
product_key_dic = {key: kdf_key_dic for key in df['Product'].unique().tolist()}

You are passing kdf_key_dic as value(in the second statement) which is same object in each iteration.

So instead of this you can pass a copy of kdf_key_dic while constructing product_key_dic

product_key_dic = {key: kdf_key_dic.copy() for key in df['Product'].unique().tolist()}
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46
1

We can try a different approach creating a MultiIndex.from_product based on the unique values from Product and Id then reshaping so we can call DataFrame.to_dict directly:

cols = ['Product', 'id']

product_key_dic = (
    df.set_index(cols).reindex(
        pd.MultiIndex.from_product(
            [df[col].unique() for col in cols],
            names=cols
        )
    )  # Reindex to ensure all pairs are present in the DF
        .replace({np.nan: None})  # Replace nan with None
        .unstack('Product')  # Create Columns from Product
        .droplevel(0, axis=1)  # Remove qtt from column MultiIndex
        .to_dict()
)

product_key_dic:

{
    'computer': {1: 10.0, 2: 20.0, 3: None, 4: 40.0},
    'laptop': {1: 100.0, 2: None, 3: 30.0, 4: None},
    'printer': {1: None, 2: 200.0, 3: None, 4: None}
}

Methods Used:

  1. DataFrame.set_index
  2. DataFrame.reindex
  3. MultiIndex.from_product
  4. Series.unique
  5. DataFrame.replace
  6. DataFrame.unstack
  7. DataFrame.droplevel
  8. DataFrame.to_dict

Setup and imports:

import numpy as np
import pandas as pd

data = [['computer', 1, 10], ['computer', 2, 20], ['computer', 4, 40],
        ['laptop', 1, 100], ['laptop', 3, 30], ['printer', 2, 200]]
df = pd.DataFrame(data, columns=['Product', 'id', 'qtt'])

The initial solution could be modified by adding a copy call to the dictionary in the comprehension to make them separate dictionaries rather than multiple references to the same one (How to copy a dictionary and only edit the copy). However, iterating over DataFrames is discouraged (Does pandas iterrows have performance issues?):

kdf_key_dic = {key: None for key in df['id'].unique().tolist()}

product_key_dic = {key: kdf_key_dic.copy()
                   for key in df['Product'].unique().tolist()}

for index, row in df.iterrows():
    product_key_dic[row['Product']].update({row['id']: row['qtt']})
product_key_dic:
{
    'computer': {1: 10.0, 2: 20.0, 3: None, 4: 40.0},
    'laptop': {1: 100.0, 2: None, 3: 30.0, 4: None},
    'printer': {1: None, 2: 200.0, 3: None, 4: None}
}
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57