3

Glom makes accessing complex nested data structures easier. https://github.com/mahmoud/glom

Given the following toy data structure:

target = [
            {
                'user_id': 198,
                'id': 504508,
                'first_name': 'John',
                'last_name': 'Doe',
                'active': True,
                'email_address': 'jd@test.com',
                'new_orders': False,
                'addresses': [
                    {
                        'location': 'home',
                        'address': 300,
                        'street': 'Fulton Rd.'
                    }
                ]
            },
            {
                'user_id': 209,
                'id': 504508,
                'first_name': 'Jane',
                'last_name': 'Doe',
                'active': True,
                'email_address': 'jd@test.com',
                'new_orders': True,
                'addresses': [
                    {
                        'location': 'home',
                        'address': 251,
                        'street': 'Maverick Dr.'
                    },
                    {
                        'location': 'work',
                        'address': 4532,
                        'street':  'Fulton Cir.'
                    },
                ]
            },
        ]

I am attempting to extract all address fields in the data structure into a flattened list of dictionaries.

from glom import glom as glom
from glom import Coalesce
import pprint

"""
Purpose: Test the use of Glom
"""    

# Create Glomspec
spec = [{'address': ('addresses', 'address') }]

# Glom the data
result = glom(target, spec)

# Display
pprint.pprint(result)

The above spec provides:

[
    {'address': [300]},
    {'address': [251]}
]

The desired result is:

[
    {'address':300},
    {'address':251},
    {'address':4532}
]

What Glomspec will generate the desired result?

Jeff Hammerbacher
  • 4,226
  • 2
  • 29
  • 36
Liquidgenius
  • 639
  • 5
  • 17
  • 32
  • 1
    I don't know about `glom`, but it looks like what you want is only a single list comprehension away: `[{'address': x['address']} for X in target for x in X['addresses']] ` – cs95 Nov 01 '18 at 20:10
  • 1
    @coldspeed I am familiar with list comprehensions and yes, that would work, however, this toy structure is significantly simplified in order to illustrate an issue I am having with the Glom module. Glom appears to have benefits when dealing with very complex structures over list comprehensions. I'm looking for any insight specifically around Glom. Thank you though! – Liquidgenius Nov 01 '18 at 20:15

1 Answers1

6

As of glom 19.1.0 you can use the Flatten() spec to succinctly get the results you want:

from glom import glom, Flatten

glom(target,  (['addresses'], Flatten(),  [{'address': 'address'}]))
# [{'address': 300}, {'address': 251}, {'address': 4532}]

And that's all there is to it!

You may also want to check out the convenient flatten() function, as well as the powerful Fold() spec, for all your flattening needs :)


Prior to 19.1.0, glom did not have first-class flattening or reduction (as in map-reduce) capabilities. But one workaround would have been to use Python's built-in sum() function to flatten the addresses:

>>> from glom import glom, T, Call  # pre-19.1.0 solution
>>> glom(target,  ([('addresses', [T])], Call(sum, args=(T, [])),  [{'address': 'address'}]))
[{'address': 300}, {'address': 251}, {'address': 4532}]

Three steps:

  1. Traverse the lists, as you had done.
  2. Call sum on the resulting list, flattening/reducing it.
  3. Filter down the items in the resulting list to only contain the 'address' key.

Note the usage of T, which represents the current target, sort of like a cursor.

Anyways, no need to do that anymore, in part due to this answer. So, thanks for the great question!

Mahmoud Hashemi
  • 2,655
  • 30
  • 19
  • 1
    Thank you, Mahmoud! Hold off on approving answer to see what comes of Sum() and Reduce()! – Liquidgenius Jan 11 '19 at 15:45
  • @Liquidgenius updated to use the most recent version of glom! :) – Mahmoud Hashemi Jan 20 '19 at 23:34
  • 1
    Awesome add to the module! Thanks for all the work on this! – Liquidgenius Jan 20 '19 at 23:37
  • Could you add an example where you capture a list of dictionaries that also include their respective user_ids? Ie: [{“user_id”:198, “address”: 300},...] – Liquidgenius Jan 20 '19 at 23:42
  • @Liquidgenius so relative lookups like that are an area we're designing on still, but you can actually do that by assigning to glom's scope, using `S` and `Assign()`: `glom(target, ([(Assign(S['user_id'], Spec(T['user_id'])), 'addresses', [{'address': 'address', 'user_id': S['user_id']}]) ], Flatten()))` And voila, we get: `[{'address': 300, 'user_id': 198}, {'address': 251, 'user_id': 209}, {'address': 4532, 'user_id': 209}]` We're losing readability, but it'll work if you simply must have a pure-glom solution :) – Mahmoud Hashemi Jan 21 '19 at 03:17
  • That provides a good use case for normalizing nested data structures. Thanks again! – Liquidgenius Jan 22 '19 at 23:46