0

I'm trying to create a nested dictionary with a set of values that are pulled from a for-loop, to measure growth and revenue amounts for various customer-product pairings. However, when I loop through a dataframe to set elements of the dictionary, each dictionary element ends up with the same values. What's going on, here?

I have already tried changing various elements of how the lists are built, but to no avail.

'''
TP_Name = customer name

Service_Level_1 = service name

100.2014 is just a marker to show that someone has started consuming the service

tpdict is already created with necessary nesting below with empty values at each endpoint
'''

for col in pivotdf.columns:
  growthlist = []
  amountlist = []
  first = True
  TP_Name, Service_Level_1 = col.split('___')
  for row in pivotdf[col]:
    if first == True:
      past = row+.00001
      first = False
    if row == 0 and past <.0001 :
      growth = 0
    elif row != 0 and past == .00001:
      growth = 100.2014
    else:
      current = row
      growth = (current-past)/past
    growth = round(growth,4)
    growthlist.append(growth)
    past = row +.00001
    amountlist.append(row)
  tpdict[TP_Name][Service_Level_1]['growth'] = growthlist
  tpdict[TP_Name][Service_Level_1]['amount'] = amountlist

'''
problem: Each value ends up being the same thing
'''

Expected results:

{'CUSTOMER NAME': {'PRODUCT1': {'growth': [unique_growthlist],   'amount': [unique_amountlist]},  'PRODUCT2': {'growth': [unique_growthlist],'amount': [unique_amountlist]}}}
enixon4
  • 51
  • 1
  • 7
  • in your expected result you have two keys with the same value, keys need to be unique – depperm Apr 02 '19 at 14:16
  • 2
    A dictionary is a key value pair (as I am sure your know). If you ever try to write to a dictionary with a key that already exists it will overwrite the value. – Error - Syntactical Remorse Apr 02 '19 at 14:16
  • @Error - Syntactical Remorse, Those keys have unique values, those values are changed in the 'for-loop' with the line `TP_Name, Service_Level_1 = col.split('___')`. @depperm - the expected results, each name and product have unique names, this is just a sample. Will amend to make that more clear. – enixon4 Apr 02 '19 at 14:19
  • Try to copy the list: `tpdict[TP_Name][Service_Level_1]['growth'] = list(growthlist)` otherwise the `dict` points to the same object that keeps on being modified – Jacques Gaudin Apr 02 '19 at 14:25
  • @enixon4 is `pivotdf` an actual `pandas.DataFrame` cos it looks like you could do what you're doing with some aggregate operations and then `to_dict()` it... – Jon Clements Apr 02 '19 at 14:26
  • @JonClements: Yes, pivotdf is a pandas.DataFrame. The columns of the Dataframe are **CustomerName___Product1**, **CustomerName___Product2** etc for 16 products and 500+ customers. The values are dollars, 21 rows. What I need is a way to produce growth rates and the dollar values for each customer-product combination. Any ideas on how to do so more efficiently would be greatly appreciated! – enixon4 Apr 02 '19 at 14:33

1 Answers1

0

A dictionary is a key value pair (as I am sure you may know). If you ever try to write to a dictionary with a key that already exists in the dictionary then the dictionary will overwrite the value for that key.

Example:

d = dict()
d[1] = 'a' # d = {1: 'a'}
d[1] = 'b' # d = {1: 'b'}

Your project seems like it may be a good use of a namedtuple in python. A namedtuple is basically a light weight class/object. My example code may be wrong because I don't know how your for loop is working (commenting helps everyone). That being said here is an example.

I only make this recommendation as dictionaries consume ~33% more memory then the objects they hold (though they are much faster).

from collections import namedtuple

Customer = namedtuple('Customer', 'name products')
Product = namedtuple('Product', 'growth amount')

customers = []
for col in pivotdf.columns:
    products = []
    growthlist = []
    amountlist = []
    first = True
    TP_Name, Service_Level_1 = col.split('___')
    for row in pivotdf[col]:
        if first == True:
            past = row + .00001
            first = False
        if row == 0 and past < .0001 :
            growth = 0
        elif row != 0 and past == .00001:
            growth = 100.2014
        else:
            current = row
            growth = (current - past) / past
        growth = round(growth, 4)
        growthlist.append(growth)
        past = row + .00001
        amountlist.append(row)

    cur_product = Product(growth=growthlist, amount=amountlist) # Create a new product
    products.append(cur_product) # Add that product to our customer

# Create a new customer with our products
cur_customer = Customer(name=TP_Name, products=products)
customers.append(cur_customer) # Add our customer to our list of customers

Here customers is a list of Customer namedtuples that we can use as objects. For example this is how we can print them out.

for customer in customers:
    print(customer.name, customer.products) # Print each name and their products
    for growth, amount in customer.products:
        print(growth, amount) # Print growth and amount for each product.
  • Named tuples would hit the same problem. You should read [this article](https://inventwithpython.com/blog/2018/02/05/python-tuples-are-immutable-except-when-theyre-mutable/) to understand mutable/immutable objects. – Jacques Gaudin Apr 02 '19 at 14:54
  • @JacquesGaudin Unless I am missing something, a named tuple solutions creates new objects on each `for` loop iteration and appends it to a list. It shares no similarities with the current problem. On top of that it is immutable unlike a dictionary. Which based on the question I do not see a reason it needs to be mutable. – Error - Syntactical Remorse Apr 02 '19 at 15:01
  • Sorry I have to disagree. Namedtuples are mutable if they **point to** a mutable value. See https://trinket.io/python/e6921d8575. The objects you are creating at each iteration are pointing to a list object that changes during the execution of the loop, and when the loop is finished they all point to the same object and are therefore all equal. – Jacques Gaudin Apr 02 '19 at 15:09
  • @JacquesGaudin There are many definitions of immutable: https://stackoverflow.com/a/9756087/8150685. In your case you consider it not deep immutable. An immutable object in python can contain mutable objects. https://stackoverflow.com/a/9758886/8150685 – Error - Syntactical Remorse Apr 02 '19 at 15:17
  • @JacquesGaudin, I think you hit the nail on the head. I have mutable values stored in each dictionary node. I'm not sure how to solve this without instantiating a unique list name for each entry... – enixon4 Apr 02 '19 at 15:19
  • @enixon4 As in more parameters for a Product beyond `growth` and `amount`? Yes but you will need to change the defenition of `Product` at the top of the code. If you mean a list of 21 products then yes as well. – Error - Syntactical Remorse Apr 02 '19 at 15:19
  • Yes that's the point, named tuple are immutable in the sense of the **reference** is immutable, just like `dict` and `tuple`. – Jacques Gaudin Apr 02 '19 at 15:19
  • @Error-SyntacticalRemorse, each product has 21 values for growth and 21 values for amount. This is why those values come in the form of lists, not just a single value. I'm afraid we would run into the same issue if we input a list as the value in your code. – enixon4 Apr 02 '19 at 15:20
  • @enixon4 Thats fine it can take a list (but you may need to change where to do the `append`s). – Error - Syntactical Remorse Apr 02 '19 at 15:23
  • 1
    @Error-SyntacticalRemorse, answer worked well, thank you! – enixon4 Apr 03 '19 at 16:21