I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data)
:
- if
path_pattern
is not empty
- if
data
is terminal, it's a failure : we did not match the full path_pattern
̀so there is no reason to apply the function. Just return data
.
- else, we have to explore every path in data. We consume the head of
path_pattern
if possible. That is return a dict data key
-> map_at(func, new_path, data value)
where new_path
is the tail
of the path_pattern
if the key matches the head
, else the `path_pattern itself.
- else, it's a success, because all the
path_pattern
was consumed:
- if
data
is terminal, return func(data)
- else, find the leaves and apply
func
: return return a dict data key
-> map_at(func, [], data value)
Notes:
- I assume that the pattern
*-b-d
matches the path 0-a-b-c-d-e
;
- it's an eager algorithm: the head of the path is always consumed when possible;
- if the path is fully consumed, every terminal should be mapped;
- it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, [], v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern
means: consume head
if possible. To use a range in the pattern, just use range(...)
.
As you can see, you never escape from case 2. : if the path_pattern
is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
EDIT
If you want to handle lists, you can try this:
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
def get_items(data):
try:
return data.items()
except AttributeError:
try:
return enumerate(data)
except TypeError:
raise
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in get_items(data)}
except TypeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, [], v) for k,v in get_items(data)}
except TypeError: # a leaf: map it
return func(data)
The idea is simple: enumerate
is the equivalent for a list of dict.items
:
>>> list(enumerate(['a', 'b']))
[(0, 'a'), (1, 'b')]
>>> list({0:'a', 1:'b'}.items())
[(0, 'a'), (1, 'b')]
Hence, get_items
is just a wrapper to return the dict items, the list items (index, value) or raise an error.
The flaw is that lists are converted to dicts in the process:
>>> data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
>>> map_at(type,['*',['b','c'],'d'],data2)
{0: {'a': 1, 'b': 2}, 1: {'a': 10, 'c': 13}, 2: {'a': 20, 'b': {'d': <class 'int'>, 'e': 101}, 'c': 23}, 3: {'a': 30, 'b': 31, 'c': {'d': <class 'int'>}}}
EDIT
Since you are looking for something like Xpath for JSON, you could try https://pypi.org/project/jsonpath/ or https://pypi.org/project/jsonpath-rw/. (I did not test those libs).