0

I want to get the value of a specific key in a nested json file, without knowing the exact location. So basically looking through all the keys (and nested keys) until it finds the match, and return a dictionary {match: "value"} Nested json_data:

{
  "$id": "1",
  "DataChangedEntry": {
    "$id": "2",
    "PathProperty": "/",
    "Metadata": null,
    "PreviousValue": null,
    "CurrentValue": {
      "CosewicWsRefId": {
        "Value": "QkNlrjq2HL9bhTQqU8-qH"
      },
      "Date": {
        "Value": "2022-05-20T00:00:00Z"
      },
      "YearSentToMinister": {
        "Value": "0001-01-01T00:00:00"
      },
      "DateSentToMinister": {
        "Value": "0001-01-01T00:00:00"
      },
      "Order": null,
      "Type": {
        "Value": "REGULAR"
      },
      "ReportType": {
        "Value": "NEW"
      },
      "Stage": {
        "Value": "ASSESSED"
      },
      "State": {
        "Value": "PUBLISHED"
      },
      "StatusAndCriteria": {
        "Status": {
          "Value": "EXTINCT"
        },
        "StatusComment": {
          "EnglishText": null,
          "FrenchText": null
        },
        "StatusChange": {
          "Value": "NOT_INITIALIZED"
        },
        "StatusCriteria": {
          "EnglishText": null,
          "FrenchText": null
        },
        "ApplicabilityOfCriteria": {
          "ApplicabilityCriteriaList": []
        }
      },
      "Designation": null,
      "Note": null,
      "DomainEvents": [],
      "Version": {
        "Value": 1651756761385.1248
      },
      "Id": {
        "Value": "3z3XlCkaXY9xinAbK5PrU"
      },
      "CreatedAt": {
        "Value": 1651756761384
      },
      "ModifiedAt": {
        "Value": 1651756785274
      },
      "CreatedBy": {
        "Value": "G@a"
      },
      "ModifiedBy": {
        "Value": "G@a"
      }
    }
  },
  "EventAction": "Create",
  "EventDataChange": {
    "$ref": "2"
  },
  "CorrelationId": "3z3XlCkaXY9xinAbK5PrU",
  "EventId": "WGxlewsUAHayLHZ2LHvFk",
  "EventTimeUtc": "2022-05-06T13:15:31.7463355Z",
  "EventDataVersion": "1.0.0",
  "EventType": "AssessmentCreatedInfrastructure"
}

Desired return is the value from json_data["DataChangedEntry"]["CurrentValue"]["Date"]["Value"]:

"2022-05-20T00:00:00Z"

So far I've tried a recursive function but it keeps return None:

match_dict = {}
def recursive_json(data,attr,m_dict):
    for k,v in data.items():
        if k == attr:
            for k2,v2 in v.items():
                m_dict = {attr, v2}
                print('IF: ',m_dict)
                return m_dict
        elif isinstance(v,dict):
            return recursive_json(v,attr,m_dict)


print('RETURN: ',recursive_json(json_data, "Date", match_dict))

Output:

RETURN:  None

I tried removing the second return statement, and it now prints the value I want in the function, but still returns None:

match_dict = {}
def recursive_json(data,attr,m_dict):
    for k,v in data.items():
        if k == attr:
            for k2,v2 in v.items():
                m_dict = {attr, v2}
                print('IF: ',m_dict)
                return m_dict
        elif isinstance(v,dict):
            recursive_json(v,attr,m_dict)


print('RETURN: ',recursive_json(json_data, "Date", match_dict))

Output:

IF:  {'Date', '2022-05-20T00:00:00Z'}
RETURN:  None

I don't get why it keeps returning None. Is there a better way to return the value I want?

Mitch
  • 553
  • 1
  • 9
  • 24
  • Are you saying you want the `['Value']` for the first match of some tag in the tree, searching depth first? So, something like `find_in_json(data, 'Date')` would return the value of the `"Value"` attribute of the first `"Data"` element in `data`, searching depth first? It's unclear why you pass `dict` and then proceed to shadow `dict` with the parameter of your function - it seems to serve no purpose considering the rest of your code. – Grismar Jun 01 '22 at 01:24
  • I cannot reproduce this. Either version of the code results in a `TypeError` for me because the built-in `dict` name has been overwritten and is also replaced locally - trying to do `isinstance(v, dict)` will not work when `dict` is an empty dictionary rather than the type itself. I assume you mean for the parameter to be named `m_dict` instead. – Karl Knechtel Jun 01 '22 at 01:31
  • "I don't get why it keeps returning None" Because functions only return what they are told to return, and return `None` if they reach the end otherwise. Making a recursive call doesn't change that; it is **just like** if you had called **any other function**. – Karl Knechtel Jun 01 '22 at 01:32
  • @Grismar yes sorry made a mistake while copying it over, fixed it now, its m_dict in the def function – Mitch Jun 01 '22 at 01:37
  • @KarlKnechtel yes sorry I fixed that error, made a typo when copying it over. So if I just have return right after it finds the matching pair I thought it would exit with that value? – Mitch Jun 01 '22 at 01:39
  • "if I just have return right after it finds the matching pair I thought it would exit with that value?" Only if it hasn't *already* returned, for example, due to a previous recursive call *not* finding a matching pair. – Karl Knechtel Jun 01 '22 at 01:50

2 Answers2

1

The underlying question is: how can we make multiple recursive calls in a loop, return the recursive result if any of them returns something useful, and fail otherwise?

If we blindly return inside the loop, then only one recursive call can be made. Whatever it returns, gets returned at this level. If it didn't find the useful result, we don't get a useful result.

If we blindly don't return inside the loop, then the values that were returned don't matter. Nothing in the current call makes use of them, so we will finish looping, make all the recursive calls, reach the end of the function... and thus implicitly return None.

The way around this, of course, is to check whether the recursive call returned something useful. If it did, we can return that; otherwise, we keep going. If we reach the end, then we signal that we couldn't find anything useful - that way, if we are being recursively called, the caller can do the right thing.

Assuming that None cannot be a "useful" value, we can naturally use that as the signal. We don't even have to return it explicitly at the end.

After fixing some other typos (we should not overwrite the global built-in dict name, and anyway we don't need to name the dict that we pass in at the start, and the parameter should be m_dict so that it's properly defined when we make the recursive call), we get:

def recursive_json(data, attr, m_dict):
    for k,v in data.items():
        if k == attr:
            for k2,v2 in v.items():
                m_dict = {attr, v2}
                print('IF: ', m_dict)
                return m_dict
        elif isinstance(v,dict):
            result = recursive_json(v, attr, m_dict)
            if result:
                return result

# call it:
recursive_json(json_data, "Date", {})

We can see that the debug trace is printed, and the value is also returned.

Let's improve this a bit:

First off, the inner for k2,v2 in v.items(): loop doesn't make any sense. Again, we can only return once per call, so this would skip any values in the dict after the first. We would be better served just returning v directly. Also, the m_dict parameter doesn't actually help implement the logic; we don't modify it between calls. It doesn't make sense to use a set for our return value, since it's fundamentally unordered; we care about the order here. Finally, we don't need the debug trace any more. That gives us:

def recursive_json(data, attr):
    for k, v in data.items():
        if k == attr:
            return attr, v
        elif isinstance(v,dict):
            result = recursive_json(v, attr)
            if result:
                return result

To get fancier, we can separate the base case from the recursive case, and use more elegant tools for each. To check if any of the keys matches, we can simply check with the in operator. To recurse and return the first fruitful result, the built-in next is useful. We get:

def recursive_json(data, attr):
    if not isinstance(data, dict):
        # reached a leaf, can't search in here.
        return None
    if attr in data:
        return k, data[k]
    candidates = (recursive_json(v, attr) for v in data.values())
    try:
        # the first non-None candidate, if any.
        return next(c for c in candidates if c is not None)
    except StopIteration:
        return None # all candidates were None.
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • 1
    If you wanted to *collect* possibly multiple results while making the recursive calls, and *not* bail out early, that is a much different question, and a little bit harder. – Karl Knechtel Jun 01 '22 at 01:51
  • Thank you for your answer! Your 2nd last script works perfectly, the last one (which I don't really understand tbh) throws an error ```AttributeError: 'str' object has no attribute 'values'```. So if I wanted to get multiple possible results (I don't), would it make sense to use a list and keep appending, and then return that list at the very end? – Mitch Jun 01 '22 at 02:04
  • Oh, sorry, I didn't test that :) let me try to fix it. I think I show some useful techniques there, but overall it does not actually produce better code for this specific problem. – Karl Knechtel Jun 01 '22 at 02:08
  • "if I wanted to get multiple possible results (I don't), would it make sense to use a list and keep appending, and then return that list at the very end?" That's the idea, yes. See also e.g. https://stackoverflow.com/questions/68561762, though I think that is a somewhat different problem. Another way is to use a recursive generator: see https://stackoverflow.com/questions/63636715/ or https://stackoverflow.com/questions/58389370. – Karl Knechtel Jun 01 '22 at 02:11
0

It seems like you're trying to write something like this:

from json import loads
from typing import Any

test_json = """
{
  "a": {
    "b": {
      "value": 1
    }
  },
  "b": {
    "value": 2
  },
  "c": {
    "b": {
      "value": 3
    },
    "c": {
      "value": 4
    }
  },
  "d": {}
}
"""

json_data = loads(test_json)


def find_value(data: dict, attr: str, depth_first: bool=True) -> (bool, Any):
    # assumes data is a dict, with 'value' attributes for the attr to be found
    # returns [whether value was found]: bool, [actual value]: Any
    for k, v in data.items():
        if k == attr and 'value' in v:
            return True, v['value']
        elif depth_first and isinstance(v, dict):
            if (t := find_value(v, attr, depth_first))[0]:
                return t
    if not depth_first:
        for _, v in data.items():
            if isinstance(v, dict) and (t := find_value(v, attr, depth_first))[0]:
                return t
    return False, None


# returns True, 1 - first 'b' with a 'value', depth-first
print(find_value(json_data, 'b'))
# returns True, 2 - first 'b' with a 'value', breadth-first
print(find_value(json_data, 'b', False))
# returns True, 4 - first 'c' with a 'value' - the 'c' at the root level has no 'value'
print(find_value(json_data, 'c'))
# returns False, None - no 'd' with a value
print(find_value(json_data, 'd'))
# returns False, None - no 'e' in data
print(find_value(json_data, 'e'))

Your own function can return None because you don't actually return the value a recursive call would return. And the default return value for a function is None.

However, your code also doesn't account for the case where there is nothing to be found.

(Note: this solution only works in Python 3.8 or later, due to its use of the walrus operator := - of course it's not that hard to write it without, but that's left as an exercisae for the reader

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • I get ```(False, None)``` when I try it with ```find_value(json_data, "Date")``` using my json_data. – Mitch Jun 01 '22 at 01:46
  • Ah, yes because you use `Value` and I used `value` in my example - the check could be made case-insensitive, or you could just replace `value` with `Value` in my example. – Grismar Jun 01 '22 at 02:31