15

If I have a dict, which field's values may also be a dict or an array. How can I remove all empty fields in it?

"Empty field" means a field's value is empty array([]), None, or empty dict(all sub-fields are empty).

Example: Input:

{
    "fruit": [
        {"apple": 1},
        {"banana": None}
    ],
    "veg": [],
    "result": {
        "apple": 1,
        "banana": None
    }
}

Output:

{
    "fruit": [
        {"apple": 1}
    ],
    "result": {
        "apple": 1
    }
}
martineau
  • 119,623
  • 25
  • 170
  • 301
Pier Cheng
  • 193
  • 1
  • 1
  • 5

5 Answers5

39

Use a recursive function that returns a new dictionary:

def clean_empty(d):
    if isinstance(d, dict):
        return {
            k: v 
            for k, v in ((k, clean_empty(v)) for k, v in d.items())
            if v
        }
    if isinstance(d, list):
        return [v for v in map(clean_empty, d) if v]
    return d

The {..} construct is a dictionary comprehension; it'll only include keys from the original dictionary if v is true, e.g. not empty. Similarly the [..] construct builds a list.

The nested (.. for ..) construct is a generator expression that allows the code to compactly filter empty objects after recursing.

Another way of constructing such a function is to use the @singledispatch decorator; you then write multiple functions, one per object type:

from functools import singledispatch

@singledispatch
def clean_empty(obj):
    return obj

@clean_empty.register
def _dicts(d: dict):
    items = ((k, clean_empty(v)) for k, v in d.items())
    return {k: v for k, v in items if v}

@clean_empty.register
def _lists(l: list):
    items = map(clean_empty, l)
    return [v for v in items if v]

The above @singledispatch version does exactly the same thing as the first function but the isinstance() tests are now taken care of by the decorator implementation, based on the type annotations of the registered functions. I also put the nested iterators (the generator expression and map() function) into a separate variable to improve readability further.

Note that any values set to numeric 0 (integer 0, float 0.0) will also be cleared. You can retain numeric 0 values with if v or v == 0.

Demo of the first function:

>>> sample = {
...     "fruit": [
...         {"apple": 1},
...         {"banana": None}
...     ],
...     "veg": [],
...     "result": {
...         "apple": 1,
...         "banana": None
...     }
... }
>>> def clean_empty(d):
...     if isinstance(d, dict):
...         return {
...             k: v
...             for k, v in ((k, clean_empty(v)) for k, v in d.items())
...             if v
...         }
...     if isinstance(d, list):
...         return [v for v in map(clean_empty, d) if v]
...     return d
... 
>>> clean_empty(sample)
{'fruit': [{'apple': 1}], 'result': {'apple': 1}}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
8

If you want a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles and other kinds of containers, I recommend looking at the remap utility from the boltons utility package.

After pip install boltons or copying iterutils.py into your project, just do:

from boltons.iterutils import remap

data = {'veg': [], 'fruit': [{'apple': 1}, {'banana': None}], 'result': {'apple': 1, 'banana': None}}

drop_falsey = lambda path, key, value: bool(value)
clean = remap(data, visit=drop_falsey)
print(clean)

# Output:
{'fruit': [{'apple': 1}], 'result': {'apple': 1}}

This page has many more examples, including ones working with much larger objects from Github's API.

It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.

Mahmoud Hashemi
  • 2,655
  • 30
  • 19
0

@mojoken - How about this to overcome the boolean problem

def clean_empty(d):
if not isinstance(d, (dict, list)):
    return d
if isinstance(d, list):
    return [v for v in (clean_empty(v) for v in d) if isinstance(v, bool) or v]
return {k: v for k, v in ((k, clean_empty(v)) for k, v in d.items()) if isinstance(v, bool) or v}
0
def not_empty(o):
    # you can define what is empty.
    if not (isinstance(o, dict) or isinstance(o, list)):
        return True
    return len(o) > 0


def remove_empty(o):
    # here to choose what container you not need to recursive or to remove
    if not (isinstance(o, dict) or isinstance(o, list)):
        return o
    if isinstance(o, dict):
        return {k: remove_empty(v) for k, v in o.items() if not_empty(v)}
    if isinstance(o, list):
        return [remove_empty(v) for v in o if not_empty(v)]
lovxin
  • 1
  • [A code-only answer is not high quality](//meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers). While this code may be useful, you can improve it by saying why it works, how it works, when it should be used, and what its limitations are. Please [edit] your answer to include explanation and link to relevant documentation. – Muhammad Mohsin Khan Mar 15 '22 at 15:26
-2
def remove_empty_fields(data_):
    """
        Recursively remove all empty fields from a nested
        dict structure. Note, a non-empty field could turn
        into an empty one after its children deleted.

        :param data_: A dict or list.
        :return: Data after cleaning.
    """
    if isinstance(data_, dict):
        for key, value in data_.items():

            # Dive into a deeper level.
            if isinstance(value, dict) or isinstance(value, list):
                value = remove_empty_fields(value)

            # Delete the field if it's empty.
            if value in ["", None, [], {}]:
                del data_[key]

    elif isinstance(data_, list):
        for index in reversed(range(len(data_))):
            value = data_[index]

            # Dive into a deeper level.
            if isinstance(value, dict) or isinstance(value, list):
                value = remove_empty_fields(value)

            # Delete the field if it's empty.
            if value in ["", None, [], {}]:
                data_.pop(index)

    return data_
Tian Chu
  • 7
  • 1