1

In order to filter list of dictionaries from empty values I need to remove ~30% of data from dictionaries.

So I've end up with this code:

   qr = query_result
   for row in qr:
        for key, value in row.items():
            if value ==' ' or value == None, value == '':
                del row[key]

However, there is an error during execution at first delete attempt:

RuntimeError: dictionary changed size during iteration

After a bit of search on stackoverflow I've found solution, that involves copying all deleted values to separate list for subsequent deletion.

delete = []
for k,v in dict.items():
    if v%2 == 1:
        delete.append(k)
for i in delete:
    del dict[i]

This, approach, transforms into such code for my case:

qr = query_result
for row in qr:
    delete = []
    for key, value in row.items():
        if value == ' ' or value == '' or value == None:
            delete.append(key)
    for i in delete:
        del row[i]

which is also suffers from certain RuntimeError.

Thus, delete loop should be outside dict foreach loop:

qr = query_result
for row in qr:
    delete = []
    for key, value in row.items():
        if value == ' ' or value == '' or value == None:
            delete.append(key)
for i in delete:
    del row[i]

But given code, unfortunately, modifies correctly only last row.

How do I process all rows and then delete garbage data?

Here is some data for testing:

c = [{'A': 'B', 'C': '3', 'EE': None, 'P': '343', 'AD': ' ', 'B': ''},
    {'A': 'B', 'C': '3', 'EE': None, 'P': '343', 'AD': ' ', 'B': ''}]

My output:

{'A': 'B', 'C': '3', 'EE': None, 'P': '343', 'AD': ' ', 'B': ''}
{'A': 'B', 'C': '3', 'P': '343'}

Desired output:

{'A': 'B', 'C': '3', 'P': '343'}
{'A': 'B', 'C': '3', 'P': '343'}
Community
  • 1
  • 1
im_infamous
  • 972
  • 1
  • 17
  • 29

3 Answers3

2

Here is a version modifying your first example, you will need to "copy" your list to iterate with it and deleting at the same time. After you're iterating with the copied list, you can delete from the original list as necessary.

import copy

qr = [{'A': 'B', 'C': '3', 'EE': None, 'P': '343', 'AD': ' ', 'B': ''},
    {'A': 'B', 'C': '3', 'EE': None, 'P': '343', 'AD': ' ', 'B': ''}]

for i, row in enumerate(copy.deepcopy(qr)):
     for key, value in row.items():
         if value in {' ', None, ''}:
             del qr[i][key]

print(qr)

Other than that, usually you want to create a new list than to delete from the original list. A simple list comprehension will do the trick:

qr = [{k:v for k, v in row.items() if v not in {' ', None, ''}} for row in qr]

print(qr) # same result

Output for both:

[{'A': 'B', 'C': '3', 'P': '343'},
 {'A': 'B', 'C': '3', 'P': '343'}]
Taku
  • 31,927
  • 11
  • 74
  • 85
1

Your approach (collect keys while iterating, delete afterwards) is correct.

Here's your problem:

qr = query_result
for row in qr:
    delete = []  # <--- here

You create a new delete list each time you touch a new row. If any data were left in it from a previous row, it is lost.

Instead, you should create it on the same level (of indentation) as you subsequently use it:

delete = []  # Only once for all rows.
qr = query_result
for row in qr:
   # ...

for k in delete:
  del data[k]
9000
  • 39,899
  • 9
  • 66
  • 104
0

A one-liner:

c = [{k: v for k, v in d.items() if v not in [' ', '', None]} for d in c]

Loop over the elements of c and then for each one only return the matching key-value pairs. This returns:

[{'A': 'B', 'P': '343', 'C': '3'}, {'A': 'B', 'P': '343', 'C': '3'}]
asongtoruin
  • 9,794
  • 3
  • 36
  • 47