1

How to generalize this question to the case keys that may be tuples?

As a benefit even in the case of all string keys, if these are accumulated to a tuple, there's no need for ad-hoc separators (though JSON export is another matter):

one approach is to base it on this answer. I tried 2 versions:

def flatten_keys(d,handler,prefix=[]):
    return {handler(prefix,k) if prefix else k : v
        for kk, vv in d.items()
        for k, v in flatten_keys(vv, handler, kk).items()
        } if isinstance(d, dict) else { prefix : d }

where the tuple handlers are:

def tuple_handler_1(prefix,k):
    return tuple([prefix]+[k])

def tuple_handler_2(prefix,k):
    return tuple(flatten_container((prefix,k)))

Using the utility generator:

def flatten_container(container):
    for i in container:
        if isinstance(i, (list,tuple)):
            for j in flatten_container(i):
                yield j
        else:
            yield i

Consider one of the test dict's but using a tuple key ('hgf',1):

data =  {'abc':123, ('hgf',1):{'gh':432, 'yu':433}, 'gfd':902, 'xzxzxz':{"432":{'0b0b0b':231}, "43234":1321}}

Neither works as intended:

flatten_keys(data,tuple_handler_1)

{'abc': 123, (('hgf', 1), 'gh'): 432, (('hgf', 1), 'yu'): 433, 'gfd': 902, ('xzxzxz', ('432', '0b0b0b')): 231, ('xzxzxz', '43234'): 1321}

('xzxzxz', ('432', '0b0b0b')). is not flattened

And the 2nd flattens the input tuple key

flatten_keys(data,tuple_handler_2)

{'abc': 123, ('hgf', 1, 'gh'): 432, ('hgf', 1, 'yu'): 433, 'gfd': 902, ('xzxzxz', '432', '0b0b0b'): 231, ('xzxzxz', '43234'): 1321}

Is there an obvious modification of the flatten method that will correctly join strings and other hashables?

EDIT

As per comments below, a problem handling key-clash with this method is inherent the base case of strings keys, eg {'a_b':{'c':1}, 'a':{'b_c':2}}.

Thus each key path should be a be tuple even in for len 1 key paths to avoid key clash eg {((1,2),): 3, (1,2):4}}.

alancalvitti
  • 476
  • 3
  • 14
  • 3
    What is your desired output? – Ajax1234 Jul 03 '19 at 20:55
  • what should the expected output look like for the given raw data? – l33tHax0r Jul 03 '19 at 21:00
  • deleted my answer. It looks like you're listing what not to do, but not giving us any indication of what you do want. (That last line is what I think would be correct.) – Kenny Ostrom Jul 03 '19 at 21:41
  • Do you want string keys or tuple keys? – Jab Jul 03 '19 at 21:55
  • @Ajax1234, flatten across levels, but preserve the type of input keys, in this case: `{'abc': 123, (('hgf', 1), 'gh'): 432, (('hgf', 1), 'yu'): 433, 'gfd': 902, ('xzxzxz', '432', '0b0b0b'): 231, ('xzxzxz', '43234'): 1321}` – alancalvitti Jul 03 '19 at 22:02
  • @Jab, any combination of (hashable) key types. – alancalvitti Jul 03 '19 at 22:03
  • 1
    @alancalvitti one problem (the consequence can be seen in Daniel's nice answer below) is that your desired output is "weird and unsafe". It is inconsistent in not taking the top-level and converting it to a tuple, which can lead to all sorts of other corner cases. – Cireo Jul 03 '19 at 22:39
  • @Cireo, can you give an example? I use this flatten operation routinely in another language with no problems for years (again modulo the JSON export issue hinted in my Q). The typical application is to generate flat tables/dataframes from hierarchical data. – alancalvitti Jul 04 '19 at 01:05
  • Sure: how about `{(1,2): 3, 1: {2: 4}}`? What output is desired? – Cireo Jul 04 '19 at 06:06
  • @Cireo, the key-clash issue is inherent the base case using only strings as pointed out by ninjagecko `{'a_b':{'c':1}, 'a':{'b_c':2}}`. I think the obvious solution is for each key path to be a be tuple even in for len 1. In your example output should be `{((1,2),): 3, (1,2):4}}`. Will edit question. – alancalvitti Jul 05 '19 at 15:37

1 Answers1

2

Assuming you want the following input/output

# input
{'abc': 123,
 ('hgf', 1): {'gh': 432, 'yu': 433},
 'gfd': 902,
 'xzxzxz': {'432': {'0b0b0b': 231}, '43234': 1321}}

# output
{('abc',): 123,
 (('hgf', 1), 'gh'): 432,
 (('hgf', 1), 'yu'): 433,
 ('gfd',): 902,
 ('xzxzxz', '432', '0b0b0b'): 231,
 ('xzxzxz', '43234'): 1321}

One approach is to recurse on your dictionary until you find a non-dictionary value and pass down the current key as a tuple during the recursion.

def flatten_dict(deep_dict): 
    def do_flatten(deep_dict, current_key): 
        for key, value in deep_dict.items():
            # the key will be a flattened tuple
            # but the type of `key` is not touched
            new_key = current_key + (key,)
            # if we have a dict, we recurse
            if isinstance(value, dict): 
                yield from do_flatten(value, new_key) 
            else:
                yield (new_key, value) 
    return dict(do_flatten(deep_dict, ()))
Daniel Perez
  • 6,335
  • 4
  • 24
  • 28
  • I like this approach using generator, but can you modify to handle the updated Q, ie, in the case of `{(1,2): 3, 1: {2: 4}}, should output `{((1,2),): 3, (1,2):4}}` as per comment thread with Cireo. Your method outputs {(1, 2): 4}. In other words, key-paths of length 1 should be wrapped in a tuple to disambiguate. – alancalvitti Jul 08 '19 at 13:43
  • I think the mod is as simple as: `entry_key = new_key if current_key else (key,)` – alancalvitti Jul 08 '19 at 14:30
  • Thanks, - just saw your update - that's even nicer. – alancalvitti Jul 08 '19 at 14:34