1

Apologies for an unclear title, but I'm not sure how else to describe the operation I'm trying to do.

django-auditlog produces "diffs" of tracked fields in Django models of the format {'field_name': [old_value, new_value]}, that keep track of fields in the database when they are changed. So a list of these diffs on a particular row in my database, sorted with the most recent diffs first, might look like the following:

# 1
[
  {
    'price': [490, 530]
  },
  {
    'status': [7, 1],
  },
  {
    'status': [1, 7],
  },
  {
    'status': [10, 1],
    'price': [0, 490],
    'location': [None, 'Calgary']
  }
]

I would like to "squash" this history like I would in Git: taking the very first value of a field and the most recent value of a field, and dropping all the intermediate values. So in the above example, I'd like the following output:

# 2
{
  'price': [0, 530],
  'status': [10, 1],
  'location': [None, 'Calgary']
}

Note that the multiple 'status' and 'price' changes have been squashed down to a single old/new pair.

I believe I could accomplish this by first creating an intermediate dictionary in which all the changes are concatenated:

# 3
{
  'price': [[0, 490], [490, 530]],
  'status': [[10, 1], [1, 7], [7, 1]],
  'location': [[None, 'Calgary']]
}

and then extracting the first list element of the first list element of each dictionary element, and the last list element of the last list element of each dictionary element.

What is a clean and Pythonic way to get #1 to look like #3?

shadowtalker
  • 12,529
  • 3
  • 53
  • 96

4 Answers4

2

In the example data shown, changes are listed in reverse chronological order. Simply step through the list building up a set of merged fields: each repeated field updates the 'old' value, with 'new' coming from the very first change.

changes = [ 
          {
            'price': [490, 530]
          },
          {
            'status': [7, 1],
          },
          {
            'status': [1, 7],
          },
          {
            'status': [10, 1],
            'price': [0, 490],
            'location': [None, 'Calgary']
          }
    ]

squashed = {}

for delta in changes:
    for field, values in delta.items():
        if field in squashed:
            squashed[field][0] = values[0]
        else:
            squashed[field] = values

yields the following:

In [7]: print(squashed)
{'status': [10, 1], 'location': [None, 'Calgary'], 'price': [0, 530]}
kdopen
  • 8,032
  • 7
  • 44
  • 52
2

dict.setdefault() might be useful for you:

from pprint import pprint
one = [
  {
    'price': [490, 530]
  },
  {
    'status': [7, 1],
  },
  {
    'status': [1, 7],
  },
  {
    'status': [10, 1],
    'price': [0, 490],
    'location': [None, 'Calgary']
  }
]

two = {}
for d in one:
    for k,v in d.items():
        two.setdefault(k, v)[0] = v[0]

pprint(two)

Result:

{'location': [None, 'Calgary'], 'price': [0, 530], 'status': [10, 1]}
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
2

Considering l the list of updates:

crunch=lambda d,u: dict(d.items()+[(k, [u[k][0], d.get(k, u[k])[1]]) for k in u])
reduce(crunch, l)

That gives you:

{'location': [None, 'Calgary'], 'price': [0, 530], 'status': [10, 1]}

So the first parameter for the reduce function is a function that receives a pair of parameters taken from the list in the following way:

l = [ 0, 1, 2, 3 ]
reduce( f, l ) == f( f ( f( f(0, 1), 2), 3)

This way the lambda function receives an incrementally built dictionary as the first parameter (d) and builds a new updated one by iterating over the updates in u.

The lambda function became excessively complicated because the update method doesn't return a dictionary but None, so it's building a new dictionary, instead, only to be able to return it.

You can replace the lambda for an actual function, as a clearer alternative, that would be able to return the updated dictionary easily:

def crunch(dic, updates):
    dic.update(
        { k: [updates[k][0], dic.get(k, updates[k])[1]] for k in updates }
    )
    return dic  # gonna be the input of the next iteration

and then do:

reduce(crunch, l)

The dictionary's get method returns the item value if k exists or the second parameter as a default value if it doesn't, so it doesn't need a defaultdict or setdefault.

olivecoder
  • 2,858
  • 23
  • 22
0

You can go straight to #2. While iterating over #1, create a new entry if the key is new, and just update the ending state:

l = [
  {
    'price': [490, 530]
  },
  {
    'status': [7, 1],
  },
  {
    'status': [1, 7],
  },
  {
    'status': [10, 1],
    'price': [0, 490],
    'location': [None, 'Calgary']
  }
]

l.reverse()
squashed = {}
for x in l:
    for k,v in x.items():
        squashed.setdefault(k, [v[0],v[1]])
        squashed[k][1] = v[1]

print squashed
Fabricator
  • 12,722
  • 2
  • 27
  • 40