0

perhaps some more experienced can explain me what’s wrong in my code. After several hours of investigating I’m giving up! I’m getting from DB a list of objects (Investments). Some of the investments have the same “investment id”. I want to build a dict with “investment id's” as a keys and investments with the same “investment id” wrapped in list as a values Here is a dummy code that emulate the investment class and build the list for test

class TestInvestment:
def __init__(self, id: str):
    self.id = id
    self.data = "bla...bla...bla"

def __repr__(self):
    return f"Investment with code {self.id}"

test_investment1 = TestInvestment('RU000A100D89')
test_investment2 = TestInvestment('RU000A100YP2')
dummy_investments = [test_investment1, test_investment2]

and here is the function that actually build the dictionary. Getting all unique id's as a set, than creating dict based on this set with empty list as values. On the end appending accordingly the investments to the dict values.

def combine_investments(investments: List[TestInvestment]):
    unique_investments_codes = set([investment.id for investment in investments])
    unique_investments_dict = dict.fromkeys(unique_investments_codes, [])
    [unique_investments_dict[investment.id].append(investment) for investment in investments]

    return unique_investments_dict

but on the end I’m getting strange result. All investments added to every key.

combined_investments = combine_investments(dummy_investments)
[print(item) for item in combined_investments.items()]

('RU000A100D89', [Investment with code RU000A100D89, Investment with code RU000A100YP2])
('RU000A100YP2', [Investment with code RU000A100D89, Investment with code RU000A100YP2])

I assume that the problem is somewhere in using the object attributes as a keys in dict. But not sure… When I’m simply hardcoding the dict with exactly the same content, everything works perfectly!

def combine_investments(investments: List[TestInvestment]):
    # unique_investments_codes = set([investment.id for investment in investments])
    # unique_investments_dict = dict.fromkeys(unique_investments_codes, [])
    unique_investments_dict = {'RU000A100D89': [], 'RU000A100YP2': []}
    [unique_investments_dict[investment.id].append(investment) for investment in investments]

    return unique_investments_dict

as a result I’m getting what I want!

('RU000A100D89', [Investment with code RU000A100D89])
('RU000A100YP2', [Investment with code RU000A100YP2])

Can some one guide me what is the reason for this behavior?

martineau
  • 119,623
  • 25
  • 170
  • 301

2 Answers2

1

You're creating the dictionary with dict.fromkeys(unique_investments_codes, []) - this creates a dictionary with those keys, where all the values are that same single list.

So, as you add to any of the values, you're adding to all of them, as the value of each dictionary key is still just the same single list.

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • Got it! Thanks for quick response. I forgot that `.fromkeys()` set the same instance of default value for all keys. Using of following construction fix the problem `unique_investments_dict = {key: [] for key in unique_investments_codes}` – Dmitry Zakharov Dec 13 '21 at 11:31
  • @DmitryZakharov: Might I suggest not pre-initializing the `dict`'s keys in the first place, and just using a `collections.defaultdict(list)` so your loop can just be `for investment in investments: unique_investments_dict[investment.id].append(investment)`? Without using `collections.defaultdict(list)`, you could use `unique_investments_dict.setdefault(investment.id, []).append(investment)` with equivalent effect (at the minor cost of creating/destroying some empty `dict`s when the key already exists). – ShadowRanger Dec 13 '21 at 11:48
1

I refactored your combine_investment function

You can iter on your investments list. First create an empty dict. if investment.id is not in this dict' keys, create this key with value as a 1-list containing the investment. Else, append it

def combine_investments(investments: List[TestInvestment]):
    unique_investments_dict = {}
    for investment in investments:
        if investment.id not in unique_investments_dict:
            unique_investments_dict[investment.id] = [investment]
        else:
            unique_investments_dict[investment.id].append(investment)
    return unique_investments_dict


thomask
  • 743
  • 7
  • 10
  • Also many thanks for proposal, but I would rather keep the implementation with list comprehension. In case of >100k "Investments" the comprehension should work faster than for loop with if statement inside. – Dmitry Zakharov Dec 13 '21 at 11:43
  • @DmitryZakharov: That's not how things work. List comprehensions are faster *at building a `list`*. Using them for side-effects makes them no faster than normal `for` loops, and wastes memory building a big `list` of `None`s only to discard them (and violates the functional design of listcomps; functional constructsshould not have side-effects). The *only* magically fast thing about listcomps is the dedicated bytecode they use to append to the new `list` as they go, and you're not benefiting from that at all. – ShadowRanger Dec 13 '21 at 11:52
  • By far the fastest solution for the "huge input, at least some repeated keys" case is using `unique_investments_dict = collections.defaultdict(list)` then simplifying the loop to the `else:` case body (when the key doesn't exist, `defaultdict` creates it for you under the hood). – ShadowRanger Dec 13 '21 at 11:53