3

I have a dictionary which i want to use as a template to generate multiple dictionaries with updated dictionary item. This list should be used as a dataset for testing purposes in unit tests in pytest.

I am using following construct in my code(checks are excluded):

def _f(template,**kwargs):
    result = [template]
    for key, value in kwargs.items():
        result = [dict(template_item,**dict([(key,v)])) for v in value for template_item in result]
    return result

template = {'a': '', 'b': '', 'x': 'asdf'}

r = _f(template, a=[1,2],b=[11,22])

pprint(r)

[{'a': 1, 'b': 11, 'x': 'asdf'},
 {'a': 2, 'b': 11, 'x': 'asdf'},
 {'a': 1, 'b': 22, 'x': 'asdf'},
 {'a': 2, 'b': 22, 'x': 'asdf'}]

I would like to ask if the construct used to build good enough - possibly it can be written more efficient.

Is this correct way to prepare testing data?

EDIT: Specially i am unsure about

[dict(template_item,**dict([(key,v)])) for v in value for template_item in result]

and

dict(template_item,**dict([(key,v)])) 

Before i was thinking about dict.update() but not suitable for comprehension because it is not returning dictionary.

then i was thinking about simple syntax like

d = {'aa': 11, 'bb': 22}
dict(d,x=33,y=44)
    {'aa': 11, 'bb': 22, 'x': 33, 'y': 44}

but i was unable to pass key value through variable. And creating dict just to unpack it sounds counterproductive to me.

Jan Sakalos
  • 143
  • 2
  • 11

1 Answers1

2

Specially i am unsure about...

The thing about updating Python dicts in comprehensions is a bit more complex because they are mutable. In Why doesn't a python dict.update() return the object? the best answer suggests your current solution. Personally I'd probably go with a regular for-loop here in order to ensure the code is legible.

Is this correct way to prepare testing data?

  1. Usually in unit tests you will test both for edge cases and regular cases (you don't wanna repeat yourself, though). You usually want to split the tests, so that each has its own name explaining why it's there and possibly some other data that could help some outsider understand why it's important to make sure this scenario works correctly. Putting all scenarios in one list and then running the test for each one of them without giving the reader additional context (in form of at least a test case name) makes it harder for the reader to distinguish between the cases and judge whether they are all really needed.
  2. Putting each of the scenarios in a separate test case may seem a bit tedious at times, but if any of the tests fails, you can immediately tell which part of the software is failing. If you feel like you write way too many unit tests, then perhaps some of them cover the same kinds of scenarios.
  3. When dealing with unit tests performance is rarely the top priority. Usually what counts more is making the tests number minimal, yet sufficient in order to ensure the software is working correctly. The other prioritized thing is making the tests easily understandable. See below for another take on this (not necessarily more performant yet hopefully more legible).

Alternative solution

You could use itertools.product in order to simplify your code. The template parameter can be removed (since you can pass the template variable names and their possible values in **kwargs):

from pprint import pprint
import itertools

def _f(**kwargs):
    keys, values = zip(*(kwargs.items())) # 1.
    subsets = [subset for subset in itertools.product(*values)] # 2.
    return [
        {key: value for key, value in zip(keys, subset)} for subset in subsets
    ] # 3.

r = _f(a=[1, 2], b=[11, 22], x=['asdf'])
pprint(r)

Now what's happening in each of these steps:

Step 1. You split the keyword dict into keys and values. It's important, so that you will fix the order of how you iterate through these arguments every time. The keys and values look like this at this point:

keys = ('a', 'b', 'x') 
values = ([1, 2], [11, 22], ['asdf'])

Step 2. You compute the cartesian product of the values, which means you get all the possible combinations of taking a value from each of the values lists. The result of this operation is as follows:

subsets = [(1, 11, 'asdf'), (1, 22, 'asdf'), (2, 11, 'asdf'), (2, 22, 'asdf')]

Step 3. Now you need to map each of keys to their corresponding values in each of the subsets, hence the list and dict comprehensions, the result should be exactly what you computed using your previous method:

[{'a': 1, 'b': 11, 'x': 'asdf'},
 {'a': 1, 'b': 22, 'x': 'asdf'},
 {'a': 2, 'b': 11, 'x': 'asdf'},
 {'a': 2, 'b': 22, 'x': 'asdf'}]
mrapacz
  • 889
  • 8
  • 22