Walking over nested dictionaries begs for recursion and by handing in the "prefix" to "path" this prevents you from having to do any manipulation on the segments of your path (as @Prune) suggests.
There are a few things to keep in mind that makes this problem interesting:
- because you are using multiple files can result in the same path in multiple files, which you need to handle (at least throwing an error, as otherwise you might just lose data). In my example I generate a list of values.
- dealing with special keys (non-string (convert?), empty string, keys containing a
.
). My example reports these and exits.
Example code using ruamel.yaml ¹:
import sys
import glob
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from ruamel.yaml.compat import string_types, ordereddict
class Flatten:
def __init__(self, base):
self._result = ordereddict() # key to list of tuples of (value, comment)
self._base = base
def add(self, file_name):
data = ruamel.yaml.round_trip_load(open(file_name))
self.walk_tree(data, self._base)
def walk_tree(self, data, prefix=None):
"""
this is based on ruamel.yaml.scalarstring.walk_tree
"""
if prefix is None:
prefix = ""
if isinstance(data, dict):
for key in data:
full_key = self.full_key(key, prefix)
value = data[key]
if isinstance(value, (dict, list)):
self.walk_tree(value, full_key)
continue
# value is a scalar
comment_token = data.ca.items.get(key)
comment = comment_token[2].value if comment_token else None
self._result.setdefault(full_key, []).append((value, comment))
elif isinstance(base, list):
print("don't know how to handle lists", prefix)
sys.exit(1)
def full_key(self, key, prefix):
"""
check here for valid keys
"""
if not isinstance(key, string_types):
print('key has to be string', repr(key), prefix)
sys.exit(1)
if '.' in key:
print('dot in key not allowed', repr(key), prefix)
sys.exit(1)
if key == '':
print('empty key not allowed', repr(key), prefix)
sys.exit(1)
return prefix + '.' + key
def dump(self, out):
res = CommentedMap()
for path in self._result:
values = self._result[path]
if len(values) == 1: # single value for path
res[path] = values[0][0]
if values[0][1]:
res.yaml_add_eol_comment(values[0][1], key=path)
continue
res[path] = seq = CommentedSeq()
for index, value in enumerate(values):
seq.append(value[0])
if values[0][1]:
res.yaml_add_eol_comment(values[0][1], key=index)
ruamel.yaml.round_trip_dump(res, out)
flatten = Flatten('myproject.section.more_information')
for file_name in glob.glob('*.yaml'):
flatten.add(file_name)
flatten.dump(sys.stdout)
If you have an additional input file:
default:
learn_more:
commented: value # this value has a comment
description: another description
then the result is:
myproject.section.more_information.default.heading: Here’s A Title
myproject.section.more_information.default.learn_more.title: Title of Thing
myproject.section.more_information.default.learn_more.url: www.url.com
myproject.section.more_information.default.learn_more.description:
- description
- another description
myproject.section.more_information.default.learn_more.opens_new_window: true
myproject.section.more_information.default.learn_more.commented: value # this value has a comment
Of course if your input doesn't have double paths, your output won't have any lists.
By using string_types
and ordereddict
from ruamel.yaml
makes this Python2 and Python3 compatible (you don't indicate which version you are using).
The ordereddict preserves the original key ordering, but this is of course dependent on the processing order of the files. If you want the paths sorted, just change dump()
to use:
for path in sorted(self._result):
Also note that the comment on the 'commented' dictionary entry is preserved.
¹ ruamel.yaml is a YAML 1.2 parser that preserves comments and other data on round-tripping (PyYAML does most parts of YAML 1.1). Disclaimer: I am the author of ruamel.yaml