0

Given a dictionary loaded from the following YAML:

song1:
  chunk1:
    attr1: value
    attr2: value
    attr3: value
  chunk2:
    attr1: value
    attr2: value
    attr3: value
  chunk3:
    attr1: value
    attr2: value
    attr3: value
song2:
  chunk1:
    attr1: value

... and so on

How can one go about reordering the keys to get something like this:

attr1:
  song1:
    chunk1: value
    chunk2: value
    chunk3: value
  song2:
    chunk1: value
    chunk2: value
    chunk3: value
  song3:
    chunk1: value
    chunk2: value
    chunk3: value
attr2:

... and so on

I think I know how to do this but I wanted to look for any optimized methods to solve this, preferably in Python using YAML but general solutions also welcome. Additionally, what is the name of this type of problem?

Edit: a rudimentary solution with python dictionaries, but I was wondering if there's any other way apart from a brute force approach to this:

reordered = {}
for song_name in output.keys():
    for chunk_name in output[song_name].keys():
        for attr_name in output[song_name][chunk_name].keys():
            if attr_name not in reordered:
                reordered[attr_name] = {}
            if song_name not in reordered[attr_name]:
                reordered[attr_name][song_name] = {}
            reordered[attr_name][song_name][chunk_name] = output[song_name][chunk_name][attr_name]

Anthon
  • 69,918
  • 32
  • 186
  • 246
rithvik
  • 13
  • 3
  • in principle, it makes little sense, as the order has no meaning in a yaml object. In practice, https://stackoverflow.com/questions/8651095/controlling-yaml-serialization-order-in-python and https://stackoverflow.com/questions/40226610/ruamel-yaml-equivalent-of-sort-keys – njzk2 Mar 10 '23 at 21:16
  • Does this answer your question? [Controlling Yaml Serialization Order in Python](https://stackoverflow.com/questions/8651095/controlling-yaml-serialization-order-in-python) – njzk2 Mar 10 '23 at 21:16
  • I checked over these threads but they don't seem to be pertinent. I'm not looking for the specific ordering of objects in the same key-pair level, I'm looking for how to rearrange keys across dictionary levels if that makes more sense. So if the original dictionary schema is `song : { chunk : { attr : values} }` I want to change this to `attr : { song : { chunk : values } }` – rithvik Mar 10 '23 at 21:23
  • oh, my bad. In that case, maybe something with de-grouping and re-grouping the values. I assume pandas can do that – njzk2 Mar 10 '23 at 21:28
  • Could you share some links regarding that, or make an answer to this post? Would appreciate it, I left an edit on my og post showing what the brute force solution is and asked if there's a better way of doing it – rithvik Mar 10 '23 at 21:34
  • check out `setdefault()` to clean this up a bit. – JonSG Mar 10 '23 at 22:29
  • setdefault() doesn't work in this situation because it either returns a value given a key or adds a key-value pair, none of which I want to do. – rithvik Mar 11 '23 at 01:47
  • As you only load your data structure from YAML, and then work on that data structure applying the tag [tag:yaml] is inappropriate ( as would be adding [tag:ascii] even though it looks like you have ASCII input ). – Anthon Mar 11 '23 at 07:03

2 Answers2

0

setdefault() doesn't work in this situation because it either returns a value given a key or adds a key-value pair, none of which I want to do.

Actually, setdefault adding a key-value pair is a bit handy in this situation as you can use it instead of if _ not in __: __[_]={} like:

reordered = {}
for song in output:
    for chunk in output[song]:
        for attr, val in output[song][chunk].items():
            reordered.setdefault(attr, {})
            reordered[attr].setdefault(song, {})
            reordered[attr][song][chunk] = val

You can also try flattening output first with something like

def get_flatDict(obj, parentKeys=tuple(), asDict=True):
    if not isinstance(obj, dict): 
        return {parentKeys: obj} if asDict else [(parentKeys, obj)]
    
    kvPairs = []
    for k, v in obj.items(): 
        kvPairs += get_flatDict(v, parentKeys=(*parentKeys,k), asDict=False)
    return dict(kvPairs) if asDict else kvPairs

and then building reordered like

reordered = {}
for (song, chunk, attr), val in get_flatDict(output).items():
    reordered.setdefault(attr, {})
    reordered[attr].setdefault(song, {})
    reordered[attr][song][chunk] = val

Either way, if you had

output = { 'song1': {
    'chunk1': {'attr1': 'value1', 'attr2': 'value2', 'attr3': 'value3'},
    'chunk2': {'attr1': 'value4', 'attr2': 'value5', 'attr3': 'value6'},
    'chunk3': {'attr1': 'value7', 'attr2': 'value8', 'attr3': 'value9'}},
  'song2': {'chunk1': {'attr1': 'value0'}} }

then reordered would look like

{ 
  'attr1': {
    'song1': {'chunk1': 'value1', 'chunk2': 'value4', 'chunk3': 'value7'},
    'song2': {'chunk1': 'value0'}
  },
  'attr2': {
    'song1': {'chunk1': 'value2', 'chunk2': 'value5', 'chunk3': 'value8'}
  },
  'attr3': {
    'song1': {'chunk1': 'value3', 'chunk2': 'value6', 'chunk3': 'value9'}
  }
}
Driftr95
  • 4,572
  • 2
  • 9
  • 21
  • `get_flatDict`? It is not often I have seen the combination of both camelCase and snake_case in one (function) name. – Anthon Mar 11 '23 at 07:10
  • @Anthon I'll spend hours settling on a name if I let myself, so I don't let myself ...but I do combine them a lot, and often in this `actionType_dataType` format – Driftr95 Mar 11 '23 at 14:41
  • The big question then is if this should be named camelSnakeCase, snakeCamelCase, cameled_snakeCase or snaked_camelCase >:-) . Not to mention the fact that a Python that has just eaten can have quite a hump... – Anthon Mar 11 '23 at 15:16
  • 1
    @Anthon lmao my personal vote would go to snaked_camelCase or perhaps the lochNess_case (since some versions of Nessie has humps...) – Driftr95 Mar 11 '23 at 21:05
0

Your example data structure consists of nested dicts. One can interpret what you are doing in different ways, but one way is to say you bump the most deeply nested key to the root dict. This can be solved in general for any depth of nesting using a recursive function and setdefault:

import sys
from pathlib import Path
import ruamel.yaml

file_in = Path('input.yaml')

yaml = ruamel.yaml.YAML()

data = yaml.load(file_in)

def reverse_keys(d, path=None, result=None):
    if path is None:
        path = []
    if result is None:
        result = {}
    if isinstance(d, dict):
        for k, v in d.items():
            reverse_keys(v, path[:] + [k], result)
    else:
        # ran out of dicts to traverse
        tmp = result
        tmp = tmp.setdefault(path[-1], {}) # use the last key as the first 
        for k in path[:-2]:  # use the first few keys to create dicts
            tmp = tmp.setdefault(k, {})
        tmp[path[-2]] = d  # use the second to last key to assign the non-dict value
    return result

yaml.dump(reverse_keys(data), sys.stdout)

which gives:

attr1:
  song1:
    chunk1: value
    chunk2: value
    chunk3: value
  song2:
    chunk1: value
    chunk2: value
    chunk3: value
  song3:
    chunk1: value
    chunk2: value
    chunk3: value
attr2:
  song1:
    chunk1: value
    chunk2: value
    chunk3: value
  song2:
    chunk1: value
    chunk2: value
    chunk3: value
  song3:
    chunk1: value
    chunk2: value
    chunk3: value
attr3:
  song1:
    chunk1: value
    chunk2: value
    chunk3: value
  song2:
    chunk1: value
    chunk2: value
    chunk3: value
  song3:
    chunk1: value
    chunk2: value
    chunk3: value
Anthon
  • 69,918
  • 32
  • 186
  • 246