0

I have following data and want to validate values of it either integer, float or string attribute::: id metric attribute::: name points attribute::: cake_name None attribute::: time None attribute::: time None ["key 'id' is not a string, got None", "key 'metric' is not a integer, got <class 'str'>", "key 'points' is not a integer, got <class 'NoneType'>"]

Alexander
  • 4,527
  • 5
  • 51
  • 98
  • no idea anyone ? – Alexander Sep 06 '22 at 21:01
  • 1
    I have a solution but is a code I have done since I found interesting parsing nested jsons, let me verify the outputs and I add the answer – Lucas M. Uriarte Sep 06 '22 at 21:09
  • @LucasM.Uriarte Can you post it ? – Alexander Sep 06 '22 at 21:11
  • 1
    Done there is my solution, I hope it is something that can help you, let me know if you don't understand something or is not clear – Lucas M. Uriarte Sep 06 '22 at 21:19
  • @LucasM.Uriarte what does `json: Union[dict, list], validator: Callable` do in explore json? – Alexander Sep 06 '22 at 21:29
  • 1
    json is just the data you are passing which can be a list or a dictionary. validator is a function you call for every field that is not a list or a dictionary. The idea for me is to separate the validator from the recursie function, in that way a validator can be any function that performs an action and append it to results. This function can do anything from checking what you asked, to changing data or anything else – Lucas M. Uriarte Sep 06 '22 at 21:33
  • "which yields" I can't understand the problem. Why is this output wrong? What should the output be instead? Why? – Karl Knechtel Sep 06 '22 at 21:38
  • @KarlKnechtel if you check `points` its an integer but the output says `"key 'points' is not a integer` and cannot access the items of `top_properties`. Those were the problems. – Alexander Sep 06 '22 at 21:43
  • I see. "if (string or integer) in my_dict: #there might be problem here;)" yes, there is a problem there, and it is a [very common duplicate](/questions/20002503). Unfortunately I am out of close votes today. – Karl Knechtel Sep 06 '22 at 21:46
  • But also, the `for (string,integer) in itertools.zip_longest...` loop makes **no sense at all**. It is saying that we should check the string attributes and int attributes in pairs, and then for each pair check that the string attribute is a string and the int attribute is an int (except we also have to handle the added `None` values). All we really want to do is check that all the string attributes are strings, and then check that all the int attributes are ints. And also I'm not sure the other issue actually resolves the problem at all. – Karl Knechtel Sep 06 '22 at 21:52
  • Very little about this code makes sense, honestly. Please read https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ and [mre]; this is not a help desk or a debugging service. Please try to understand *what happens specifically* in the code *that is different from your expectation*, and *where* it happens, by carefully tracing the execution of the code. Then try to create a small example that *focuses* on the problem, using code that *directly* demonstrates the problem. – Karl Knechtel Sep 06 '22 at 21:53
  • @LucasM.Uriarte what does `else: out = validator(key, val) if out is not None: result.append(out)` part do ? little confused there – Alexander Sep 06 '22 at 21:56
  • @Alexander, that part of the code is the important one, validator, verifies if any key is in the lists you asked. If it finds the key in any list, then check that the value is correct or not if it is not correct, gives a string as result if not a None. Then inside the recursive function, explore_json only if the output is not None, the data is append to the results – Lucas M. Uriarte Sep 06 '22 at 22:02
  • @Alexander let me know if you have more questions – Lucas M. Uriarte Sep 06 '22 at 22:03
  • @LucasM.Uriarte ok I slowly digesting this:) The recursive part is confused me but as I now printing every step to understand what's being done in each line. This is really neat solution by the way! I may reach out to you if I come another question. Thanks – Alexander Sep 06 '22 at 22:18
  • @LucasM.Uriarte Maybe one another validation could be a data.frame. Say If I have data frame in json data how to verify it ? – Alexander Sep 06 '22 at 22:20
  • @LucasM.Uriarte and is it possible to print which `anticipations` is failed as it was shown in `f"anticipation -> {i} error: {validation_error}"` in original question. – Alexander Sep 06 '22 at 22:30
  • 1
    @Alexander. I dont follow this last question, you mean you have dataframe with a nested json inside a column – Lucas M. Uriarte Sep 06 '22 at 22:30
  • @Alexander yes it is possible you need to work on the list part of the function, explore_json. If you have not manage tomorrow I can check again now I need to go¡ – Lucas M. Uriarte Sep 06 '22 at 22:33
  • @LucasM.Uriarte yes dataframe with a nested json inside a column. I'll work on the OP to reflect that, – Alexander Sep 06 '22 at 22:43
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247853/discussion-between-alexander-and-lucas-m-uriarte). – Alexander Sep 07 '22 at 17:29
  • @LucasM.Uriarte is that possible to index which `"anticipations"` validation failed as shown in Original question ? Like indexing it `'anticipation -> 0 error: ["key \'LA:TB2342\' is not a string, got 0.23"` – Alexander Sep 07 '22 at 17:47
  • 1
    @Alexander Added an edit to track the parent key – Lucas M. Uriarte Sep 07 '22 at 19:57

1 Answers1

1

My solution is a recursive soltuion, to read nested json data.

from functools import partial
from typing import Union, Callable
import json

def get_output(key, val, string_keys: list, int_keys: list, float_keys: list):
    out = None
    if key in string_keys:
        if not isinstance(val, str):
            out = f"key '{key}' is not a string, got {type(val)}"
    elif key in int_keys:
        if not isinstance(val, int):
            out = f"key '{key}' is not a integer, got {type(val)}"
    elif key in float_keys:
        if not isinstance(val, float):
            out = f"key '{key}' is not a float, got {type(val)}"
    return out

def explore_json(json: Union[dict, list], validator: Callable):
    result = []
    if isinstance(json, dict):
        for key, val in json.items():
            if isinstance(val, (dict, list)):
                result.extend(explore_json(val, validator))
            else: 
                out = validator(key, val)
                if out is not None:
                    result.append(out) 
    elif isinstance(json, list):
        for val in json:
             result.extend(explore_json(val, validator))
    return result

data = json.loads(json_data)
explore_json(data, validator)
                          
validator = partial(get_output,
                    string_keys=["id", "name", "cake_name", "time"], 
                    int_keys=['metric','points'], 
                    float_keys=["LA:TB2342", "LA:TB2341", "LA:TB2344"])
data = json.loads(json_data)
explore_json(data, validator)

The output of this is:

["key 'id' is not a string, got <class 'NoneType'>",
 "key 'metric' is not a integer, got <class 'str'>",
 "key 'LA:TB2342' is not a float, got <class 'str'>"]

The advance of the partial function is that we can have a validator for each specific json.

Moreover, note that only the keys inside the list string_keys, int_keys, float_keys defined in our specific validator can be in the output list any key not inside these lists is not verified.

Finally, I'm not sure if the lists are the same as yours, but just change them and check the output.

EDIT For tracking parent key:


def explore_json(json: Union[dict, list], validator: Callable, parent_key=" parent_key:"):
    result = []
    if isinstance(json, dict):
        for key, val in json.items():
            if isinstance(val, (dict, list)):
                #result = explore_json(val, validator, result)
                result.extend(explore_json(val, validator, f"{parent_key}.{key}"))
            else: 
                out = validator(key, val)
                if out is not None:
                    if parent_key != " parent_key:":
                        out += parent_key
                    result.append(out) 
    elif isinstance(json, list):
        for block_num, val in enumerate(json):
            result.extend(explore_json(val, validator, f"{parent_key}.item{block_num}"))
            # result = explore_json(val, validator, result)
    return result

output:

["key 'id' is not a string, got <class 'NoneType'>",
 "key 'metric' is not a integer, got <class 'str'>",
 "key 'LA:TB2342' is not a float, got <class 'str'> parent_key:.anticipations.item1.top_properties"]

item1 indicates that the error is in the first element of the list for key anticipations

Lucas M. Uriarte
  • 2,403
  • 5
  • 19
  • Worked like a charm! Thanks for your time and effort. I wonder why in the original post it's written in a way actually we cannot access to 'anticipations' ? – Alexander Sep 06 '22 at 21:25
  • "it's written in a way actually we cannot access to 'anticipations' ?" It does, but the loop does not do anything, because `(string or integer) in my_dict` is never satsified, because none of those `string` values is a key for any of the `anticipation` dicts. (ah, okay, the duplicate I have in mind *does* explain *that* problem.) – Karl Knechtel Sep 06 '22 at 21:56