27

I have a dictionary with the following structure:

D = {
   'rows': 11,
   'cols': 13,
   (i, j): {
              'meta': 'random string',
              'walls': {
                  'E': True,
                  'O': False,
                  'N': True,
                  'S': True
              }
           }
}
# i ranging in {0 .. D['rows']-1}
# j ranging in {0 .. D['cols']-1}

I want to write a function that takes an arbitrary object as argument and checks if it has that structure. This is what I wrote:

def well_formed(L):
    if type(L) != dict:
        return False
    if 'rows' not in L:
        return False
    if 'cols' not in L:
        return False

    nr, nc = L['rows'], L['cols']

    # I should also check the int-ness of nr and nc ...

    if len(L) != nr*nc + 2:
        return False

    for i in range(nr):
        for j in range(nc):
            if not ((i, j) in L
                and 'meta' in L[i, j]
                and  'walls' in L[i, j]
                and type(L[i, j]['meta']) == str
                and type(L[i, j]['walls'])  == dict
                and     'E' in L[i, j]['walls']
                and     'N' in L[i, j]['walls']
                and     'O' in L[i, j]['walls']
                and     'S' in L[i, j]['walls']
                and type(L[i, j]['walls']['E']) == bool
                and type(L[i, j]['walls']['N']) == bool
                and type(L[i, j]['walls']['O']) == bool
                and type(L[i, j]['walls']['S']) == bool):
                return False

    return True

Although it works, I don't like it at all. Is there a Pythonic way to do it?

I'm allowed to use just the standard library.

Georgian Sarghi
  • 373
  • 1
  • 3
  • 9
  • 7
    I'd look at writing a [JSON Schema](http://json-schema.org/documentation.html) for your dictionary and then using the [Python module](https://pypi.python.org/pypi/jsonschema) to validate it. – tzaman May 06 '17 at 13:52
  • 1
    I think raising exceptions would be a better idea that only return true or false, since it have many test done here. It may be easier to know why the dict doesn't pass the tests – Loïc G. May 06 '17 at 13:53
  • @tzaman converting that dictionary to JSON would be tricky, since one of the keys is a tuple, and JSON only natively supports strings as object keys. You'd have to encode the tuple as a string, which can potentially create ambiguity (e.g. if the object being checked has string keys in the same format you're using for the tuples), I'd consider that invalid, but a JSON validator wouldn't be able to tell it apart from a valid tuple that was encoded as a string. – tavnab May 06 '17 at 13:58
  • see related question. http://stackoverflow.com/q/6011881/7724457 – tell k May 06 '17 at 14:03
  • At the very least, use `is` instead of `==` when comparing types. – chepner May 06 '17 at 14:15
  • For a proper answer, you are going to have to be explicit about what constitutes a well-formed object. We have only your implementation to go on: for instance, does `L[i,j]['meta']` have to be exactly a `str`, or could it be any thing that can be converted to a string? Is it OK if the set of walls is a *superset* of `{'E', 'N', 'O', 'S'}`? – chepner May 06 '17 at 14:18
  • Another important factor: do the various objects have to *specifically* be `dict`s, or can they be anything that support `__getitem__`? – chepner May 06 '17 at 14:24
  • @chepner I think that the behavior of well_formed is quite explicit: it has to be a dict, it has to be a str etc.. – Georgian Sarghi May 06 '17 at 14:27
  • @georgian Again, that's *your* implementation, not a specification; we don't know if it is stricter than necessary. (And in idiomatic Python code, it is. What if `L` is an instance of a subclass of `dict`? Is that really a problem?) – chepner May 06 '17 at 14:46
  • There is little point in asking if your code is Pythonic if what it is checking for is un-Pythonic. – chepner May 06 '17 at 14:48
  • I shouldn't use that term. What I was trying to ask is a *nicer* way to write something un-Pythonic. Regarding the specification issue, I get your point, thanks for your answer. – Georgian Sarghi May 06 '17 at 14:55
  • Checking whether `len(L) != nr*nc + 2` is not very pythonic. If a dict has all of the required keys, but also happens to have `foo` and `bar` keys, that should be okay. – David Hammen May 07 '17 at 04:19

6 Answers6

19

Firstly, I think the more 'Pythonic' thing might be to ask forgiveness rather than permission - to check when you needed a property whether the data structure had that property.

But on the other hand, that doesn't help if you've been asked to create something to check it's well-formed. :)

So if you need to check, you can use something like the schema library to define what your data structure should look like, and then check other data structures against that schema.

bouteillebleu
  • 2,456
  • 23
  • 32
8

In Python, the exact identity of the types involved is less important than how the values behave. For a defined use of such an object, would the object suffice? This means L doesn't have to be a dict, it just has support __getitem__; L[(i,j)]['meta'] doesn't have to be a str, it just has to support conversion to a string via str(L[(i,j)]['meta']); etc.

Given that relaxation, I would simply try to catch any errors raised by attempting such actions, and return False if any occur. For example,

def well_formed(L):
    try:
        nr = L['rows']
        nc = L['cols']
    except KeyError:
        return False

    try:
        for i in range(nr):
            for j in range(nc):
                str(L[(i,j)]['meta'])
                walls = L[(i,j)]['walls']
                for direction in walls:
                    # Necessary?
                    if direction not in "ENOS":
                        return False
                    if walls[direction] not in (True, False):
                        return False
    except KeyError:
        return False

    return True

Given that any object has a boolean value, it seemed pointless to attempt bool(walls[direction]); rather, if having exactly True or False as a value isn't a hard requirement, I would just any test of the value altogether. Likewise, additional walls may or may not be an issue, and don't have to be explicitly tested for.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Why two `try` blocks instead of trying everything? – Mephy May 06 '17 at 20:17
  • To be honest, I'm not sure. I was trying a *lot* of different approaches, and I think at one point it made sense for the first two checks to be in their own `try` statement. With the version seen here, not so much. – chepner May 06 '17 at 21:24
  • 2
    I don't agree with your point about `str()`. Firstly everything supports `str()` just like everything supports `bool()`. Secondly if the value isn't a string it indicates that (1) the code that produced the dict might have an error and (2) the code that uses the dict may need to know how to handle both string and nonstring values. – Alex Hall May 06 '17 at 23:03
4

You can compose validation like this (idea from Scala extractors). The advantage is that the validator structure is similar to the structure to test.

A disadvantage is that the many function calls may make it much slower.

class Mapping:
    def __init__(self, **kwargs):
        self.key_values = [KeyValue(k, v) for k, v in kwargs.items()]

    def validate(self, to_validate):
        if not isinstance(to_validate, dict):
            return False

        for validator in self.key_values:
            if not validator.validate(to_validate):
                return False
        return True


class KeyValue:
    def __init__(self, key, value):
        self.key = key
        self.value = value

    def validate(self, to_validate):
        return self.key in to_validate and self.value.validate(to_validate[self.key])


class Boolean:
    def validate(self, to_validate):
        return isinstance(to_validate, bool)


class Integer:
    def validate(self, to_validate):
        return isinstance(to_validate, int)


class String:
    def validate(self, to_validate):
        return isinstance(to_validate, str)


class CustomValidator:
    def validate(self, to_validate):
        if not Mapping(rows=Integer(), cols=Integer()).validate(to_validate):
            return False
        element_validator = Mapping(meta=String(), walls=Mapping(**{k: Boolean() for k in "EONS"}))
        for i in range(to_validate['rows']):
            for j in range(to_validate['cols']):
                if not KeyValue((i, j), element_validator).validate(to_validate):
                    return False
        return True


d = {
    'rows': 11,
    'cols': 13,
}
d.update({(i, j): {
    'meta': 'random string',
    'walls': {
        'E': True,
        'O': False,
        'N': True,
        'S': True
    }
} for i in range(11) for j in range(13)})

assert CustomValidator().validate(d)

Same thing with overriding isinstance(tested with Python 3.5)

class IsInstanceCustomMeta(type):
    def __instancecheck__(self, instance):
        return self.validate(instance)

def create_custom_isinstance_class(f):
    class IsInstanceCustomClass(metaclass=IsInstanceCustomMeta):
        validate = f
    return IsInstanceCustomClass

def Mapping(**kwargs):
    key_values = [KeyValue(k, v) for k, v in kwargs.items()]

    def validate(to_validate):
        if not isinstance(to_validate, dict):
            return False

        for validator in key_values:
            if not isinstance(to_validate, validator):
                return False
        return True

    return create_custom_isinstance_class(validate)

def KeyValue(key, value):
    return create_custom_isinstance_class(lambda to_validate: key in to_validate and isinstance(to_validate[key], value))

def my_format_validation(to_validate):
    if not isinstance(to_validate, Mapping(rows=int, cols=int)):
        return False
    element_validator = Mapping(meta=str, walls=Mapping(**{k: bool for k in "EONS"}))
    for i in range(to_validate['rows']):
        for j in range(to_validate['cols']):
            if not isinstance(to_validate, KeyValue((i, j), element_validator)):
                return False
    return True

MyFormat = create_custom_isinstance_class(my_format_validation)

d = {
    'rows': 11,
    'cols': 13,
}
d.update({(i, j): {
    'meta': 'random string',
    'walls': {
        'E': True,
        'O': False,
        'N': True,
        'S': True
    }
} for i in range(11) for j in range(13)})

assert isinstance(d, MyFormat)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Siphor
  • 2,522
  • 2
  • 13
  • 10
3

If your format were simpler, I'd agree with the other answer/comments to use existing schema validation libraries, like schema and voluptuous. But, given your specific case of having to check a dictionary with tuple keys, and those tuples' values depending on the values of other members of your dict, I think you'd be better off writing your own validator than trying to coax a schema to fit your format.

tavnab
  • 2,594
  • 1
  • 19
  • 26
1
from itertools import product

def isvalid(d):
    try:
        for key in product(range(d['rows']), range(d['cols'])):
            sub = d[key]
            assert (isinstance(sub['meta'], str) and
                    all(isinstance(sub['walls'][c], bool)
                        for c in 'EONS'))
    except (KeyError, TypeError, AssertionError):
        return False
    return True

If Python 2 compatibility is important or it's necessary to assert that there are no additional keys, let me know.

Alex Hall
  • 34,833
  • 5
  • 57
  • 89
1

If you use something like this:

def get_deep_keys(d, depth=2):
    """Gets a representation of all dictionary keys to a set depth.

    If a (sub)dictionary contains all non-dictionary values, a list of keys
    will be returned.

    If a dictionary contains a mix of types, a dictionary of dicts/lists/types
    will be returned.
    """
    if isinstance(d, dict):
        if depth > 0 and any(isinstance(v, dict) for v in d.values()):
            return {k: get_deep_keys(v, depth=depth - 1) for k, v in d.items()}
        else:
            return set(d.keys())

    else:
        return type(d)

Then you can do:

assert get_deep_keys(D[i, j]) == {
    'meta': str, 'walls': {'E', 'N', 'O', 'S'}}

Inside a loop over i, j. This is pretty easy to modify to return the types of the bottom level elements instead:

def get_deep_keys(d, depth=2):
    """Gets a representation of all dictionary keys to a set depth, with types.
    """
    if isinstance(d, dict) and depth > 0:
        return {k: get_deep_keys(v, depth=depth - 1) for k, v in d.items()}

    return type(d)


get_deep_keys(D)
 # {'meta': str, 'walls': {'E': bool, 'O': bool, 'N': bool, 'S': bool}}
naught101
  • 18,687
  • 19
  • 90
  • 138