How can I "squash" a list of dictionaries?

Question

Apologies for an unclear title, but I'm not sure how else to describe the operation I'm trying to do.

django-auditlog produces "diffs" of tracked fields in Django models of the format {'field_name': [old_value, new_value]}, that keep track of fields in the database when they are changed. So a list of these diffs on a particular row in my database, sorted with the most recent diffs first, might look like the following:

# 1
[
  {
    'price': [490, 530]
  },
  {
    'status': [7, 1],
  },
  {
    'status': [1, 7],
  },
  {
    'status': [10, 1],
    'price': [0, 490],
    'location': [None, 'Calgary']
  }
]

I would like to "squash" this history like I would in Git: taking the very first value of a field and the most recent value of a field, and dropping all the intermediate values. So in the above example, I'd like the following output:

# 2
{
  'price': [0, 530],
  'status': [10, 1],
  'location': [None, 'Calgary']
}

Note that the multiple 'status' and 'price' changes have been squashed down to a single old/new pair.

I believe I could accomplish this by first creating an intermediate dictionary in which all the changes are concatenated:

# 3
{
  'price': [[0, 490], [490, 530]],
  'status': [[10, 1], [1, 7], [7, 1]],
  'location': [[None, 'Calgary']]
}

and then extracting the first list element of the first list element of each dictionary element, and the last list element of the last list element of each dictionary element.

What is a clean and Pythonic way to get #1 to look like #3?

I still think your output does not match up, the first for price would be 490 and last of the last would be 490 if your data is ordered as you say it is — Padraic Cunningham, Jun 14 '16 at 23:43
@TadhgMcDonald-Jensen No a strict duplicate. That one is about keys, this one is about values. — kdopen, Jun 14 '16 at 23:48
@PadraicCunningham It's reverse-chronological. 'price' went from 0 to 490, then 490 to 530, giving a final result of [0,530] — kdopen, Jun 14 '16 at 23:49

score 2 · Answer 1 · answered Jun 14 '16 at 23:43

In the example data shown, changes are listed in reverse chronological order. Simply step through the list building up a set of merged fields: each repeated field updates the 'old' value, with 'new' coming from the very first change.

changes = [ 
          {
            'price': [490, 530]
          },
          {
            'status': [7, 1],
          },
          {
            'status': [1, 7],
          },
          {
            'status': [10, 1],
            'price': [0, 490],
            'location': [None, 'Calgary']
          }
    ]

squashed = {}

for delta in changes:
    for field, values in delta.items():
        if field in squashed:
            squashed[field][0] = values[0]
        else:
            squashed[field] = values

yields the following:

In [7]: print(squashed)
{'status': [10, 1], 'location': [None, 'Calgary'], 'price': [0, 530]}

I like how you re-use of the latest dict to avoid reversing the list of changes — shadowtalker, Jun 15 '16 at 00:00

Robᵩ · Accepted Answer · 2016-06-15T00:39:02.640

2

dict.setdefault() might be useful for you:

from pprint import pprint
one = [
  {
    'price': [490, 530]
  },
  {
    'status': [7, 1],
  },
  {
    'status': [1, 7],
  },
  {
    'status': [10, 1],
    'price': [0, 490],
    'location': [None, 'Calgary']
  }
]

two = {}
for d in one:
    for k,v in d.items():
        two.setdefault(k, v)[0] = v[0]

pprint(two)

Result:

{'location': [None, 'Calgary'], 'price': [0, 530], 'status': [10, 1]}

edited Jun 15 '16 at 00:39

answered Jun 14 '16 at 23:43

Robᵩ

163,533
20
239
308

I had no idea you could _assign_ to `setdefault` – shadowtalker Jun 15 '16 at 05:23

olivecoder · Answer 3 · 2016-06-15T02:02:13.497

2

Considering l the list of updates:

crunch=lambda d,u: dict(d.items()+[(k, [u[k][0], d.get(k, u[k])[1]]) for k in u])
reduce(crunch, l)

That gives you:

{'location': [None, 'Calgary'], 'price': [0, 530], 'status': [10, 1]}

So the first parameter for the reduce function is a function that receives a pair of parameters taken from the list in the following way:

l = [ 0, 1, 2, 3 ]
reduce( f, l ) == f( f ( f( f(0, 1), 2), 3)

This way the lambda function receives an incrementally built dictionary as the first parameter (d) and builds a new updated one by iterating over the updates in u.

The lambda function became excessively complicated because the update method doesn't return a dictionary but None, so it's building a new dictionary, instead, only to be able to return it.

You can replace the lambda for an actual function, as a clearer alternative, that would be able to return the updated dictionary easily:

def crunch(dic, updates):
    dic.update(
        { k: [updates[k][0], dic.get(k, updates[k])[1]] for k in updates }
    )
    return dic  # gonna be the input of the next iteration

and then do:

reduce(crunch, l)

The dictionary's get method returns the item value if k exists or the second parameter as a default value if it doesn't, so it doesn't need a defaultdict or setdefault.

edited Jun 15 '16 at 02:02

answered Jun 14 '16 at 23:58

olivecoder

2,858
23
22

Thanks for an interesting alternative but some explanation would be nice – shadowtalker Jun 14 '16 at 23:58
I posted the minimum to improve it in gradual steps. Dont you think that is a bit though to downvote people trying to help you with a right answer? – olivecoder Jun 14 '16 at 23:59
Dense one-liners are just not helpful unless you unpack them. With > 1k rep you should know that – shadowtalker Jun 15 '16 at 00:01
Yes, I know, hence I've added more information as you can see. You more than 2K rep I think you should be less arrogant and slower to downvote. – olivecoder Jun 15 '16 at 00:06
Thank you for clarifying – shadowtalker Jun 15 '16 at 00:09

score 0 · Answer 4 · answered Jun 14 '16 at 23:43

You can go straight to #2. While iterating over #1, create a new entry if the key is new, and just update the ending state:

l = [
  {
    'price': [490, 530]
  },
  {
    'status': [7, 1],
  },
  {
    'status': [1, 7],
  },
  {
    'status': [10, 1],
    'price': [0, 490],
    'location': [None, 'Calgary']
  }
]

l.reverse()
squashed = {}
for x in l:
    for k,v in x.items():
        squashed.setdefault(k, [v[0],v[1]])
        squashed[k][1] = v[1]

print squashed

How can I "squash" a list of dictionaries?

4 Answers4