Extracting fields of a list of dictionaries into a new dictionary using glom

Question

I have the following highly simplified structure

elements = [{"id": "1", "counts": [1, 2, 3]},
            {"id": "2", "counts": [4, 5, 6]}]

I'd like to be able to construct, using glom, a new dictionary of the form {<id>: <counts[pos]>}, e.g. for pos = 2:

{"1": 3, "2": 6}

or alternatively a list/tuple of tuples

[("1",3), ("2", 6)]

Using a dict comprehension is easy, but the data structure is more complicated and I'd like to dynamically specify what to extract. The previous example would be the simplest thing I'd like to achieve.

After a while I managed to solve it as follows

from glom import glom, T

elements = [{"id": "1", "counts": [1,2,3]},{"id": "2", "counts": [4,5,6]}]

def extract(elements, pos):
    extracted = glom(elements, ({"elements": [lambda v: (v["id"], v["counts"][pos])]}, T))
    return dict(extracted["elements"])

But this requires a call to dict. A slight variation that skips a dictionary indirection would be

def extract(elements, pos):
    extracted = glom(elements, (([lambda v: {v["id"]: v["counts"][pos]}]), T))
    return {k: v for d in extracted for k, v in d.items()}

Now, I could use the merge function called onto the returned values from the glom call

def extract(elements, pos):
    return merge(glom(elements, (([lambda v: {v["id"]: v["counts"][pos]}]), T)))

I'm rather satisfied with this, but is there a better approach to do this? And with better I mean building a single cleaner spec callable? Ultimately, I'd like to be able to define at runtime in a user friendly way the values of the dictionary, i.e., v["counts"][pos].

An improvement towards this idea would be to use a callable to be invoked for the value of the internal dictionary

def counts_position(element, **kwargs):
    return element["counts"][kwargs["pos"]]

def extract(elements, func, **kwargs):
    return merge(glom(elements, (([lambda v: {v["id"]: func(v, **kwargs)}]), T)))

extract(values, counts_position, pos=2)

With this, what's begin extracted from each element can be controlled externally.

FWIW, I didn't get why you need glom and how you evaluate cleanliness ("building a single cleaner spec callable")... — Nickolay, Aug 14 '19 at 12:01
For this example is silly, but this is a toy example. Not that I need glom, but I'm exploring how I could use it to build some flexible way to extract data from an iterable of nested dictionaries without having to create different functions upfront. Regarding "cleanliness", I'm not sure how to evaluate it, but my solutions feel cumbersome. — Ignacio Vergara Kausel, Aug 14 '19 at 12:22
Thing is I didn't see where you need that flexibility. I can guess you meant something like "I want to write a function `extract(values, accessor, pos)`, where `values` is always a list of dicts having `id` and varying other keys, `accessor` is provided by the user (e.g. `v["counts"][pos]`), and my code would call this with the user-provided `accessor` and different values of `pos`"? — Nickolay, Aug 14 '19 at 12:27
Yes, that's basically the gist of it I'd say. It could be that one key points to another dictionary (lets say `metadata`), and then I want some data from that nested dictionary and also one or more elements of the top level `count` key`. — Ignacio Vergara Kausel, Aug 14 '19 at 13:07
Let me double-check: you confirm that the input is always a list of dicts with `id` in each one and you always want an `id`-keyed dict as the output? — Nickolay, Aug 14 '19 at 13:11
For this case you're correct a list of dicts with at least an `id` key and I want the output to be keyed by that `id` value. This could be thought as a filtering/preprocessing of a json answer from an API that I'd like to put into a pandas dataframe. — Ignacio Vergara Kausel, Aug 14 '19 at 13:26
If this much is fixed, pushing glom invocation "inside" the dict comprehension would arguably be clearer: `{t["id"]: glom.glom(t, "counts.2") for t in elements}` (otherwise this would be `glom.glom(elements, glom.Merge([{T['id']: 'counts.2'}]))`). As for using the `pos` param, what about interpolating it into a string like `'counts.%s' % pos`? Otherwise, I think a more realistic example would be helpful to provide a good answer. — Nickolay, Aug 14 '19 at 13:54
@Nickolay that looks great! As far as I'm concerned at this point I'd consider it as a valid answer. — Ignacio Vergara Kausel, Aug 14 '19 at 13:57

score 3 · Accepted Answer · answered Aug 14 '19 at 14:01

To convert a list of dicts with id in each one to an id-keyed dict you could use a simple dict comprehension:

{t["id"]: glom.glom(t, "counts.2") for t in elements}

Or, if you want to use glom for that, use glom.Merge along with glom.T:

glom.glom(elements, glom.Merge([{T['id']: 'counts.2'}])))

To avoid lambdas, you could interpolate the pos param into the spec string, e.g. 'counts.%s' % pos.

Extracting fields of a list of dictionaries into a new dictionary using glom

1 Answers1