1

Let's say I have this:

d = {'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']}

And I'd like this:

d = {'a': [1, 3, 4], 'b': ['10', '30', '40']}

If I see an empty element in b, I'd like to delete it, which is d["b"][1] and the same time delete d["a"][1] at the same index.

EDIT: Forget to mention, that you must not change the order of any elements.

poocha316
  • 23
  • 3

5 Answers5

4

Here's an idea. Looks like you're treating your dictionary as a data frame, since you're "connecting" your lists by index.

So why not just use a library and do your operations in a clean way and efficently?

import pandas as pd
df = pd.DataFrame(d)

Yields

   a   b
0  1  10
1  2    
2  3  30
3  4  40

Then

df[~df.eq('').any(1)]

   a   b
0  1  10
2  3  30
3  4  40

After all manipulation, if you need your dictionary back:

df.to_dict('list')

{'a': [1, 3, 4], 'b': ['10', '30', '40']}
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • 1
    This is probably the best answer. The available libraries in Python are there to make you life easier, might as well use them. – James Sep 11 '19 at 20:51
  • Just two questions. 1) what is the ~ in the df[~df.eq('').any(1)] 2) I tried this and works perfectly but I got a "FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison result = method(y)" (Pandas 0.25.1) – poocha316 Sep 12 '19 at 05:34
  • @NorbertTóth that's fine! That's not a problem with pandas, but with numpy. It happens when you compare different types. Not much to worry about, but for details [take a look here](https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur) – rafaelc Sep 12 '19 at 13:48
  • @NorbertTóth the `~` operates as a logical "NOT" operator for pandas series. – pault Sep 16 '19 at 20:50
2

A general solution:

  • find which indices are blanks and put them in an unique list, sorted in reversed order
  • loop in the values and remove the indices

The decreasing order ensures that if there are several blanks the proper elements are removed.

d = {'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']}

empty_indexes = sorted({i for v in d.values() for i,x in enumerate(v) if not x},reverse=True)

for v in d.values():
    for i in empty_indexes:
        try:
            v.pop(i)
        except IndexError:
            pass

A oneliner (inspired by pault in comments):

dict(zip(d,[list(y) for y in zip(*(x for x in zip(*d.values()) if all(i!="" for i in x)))]))

decrypting this:

  • the inner zip transposes the values.
  • a generator comprehension filters the rows where all elements are non-empty (if all(...)
  • the middle zip transposes back into the original orientation
  • zipping keys & values rebuilds the dictionary. There's no order issue as keys are guaranteed to be ordered the same as values, no matter the version of python.

The oneliner is hard to read, and can be decomposed in loops. It doesn't require sort+unicity of the indices. In fact, it doesn't need indices at all.

One-liner free:

values = []  # init list of values
for y in zip(*d.values()):   # loop on assembled values
    if all(i != "" for i in y):  # filter out rows which contain empty strings
        values.append(y)

# transpose back / convert to list (since zip yields tuples)
values = [list(x) for x in zip(*values)]

# rebuild dictionary. Order of d and values is the same
d = dict(zip(d,values))
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

As a general solution, assuming each list is of the same size, you can use:

def drop_empty(d, key):
    '''
    Drops values from all lists in the dictionary `d` at the
    indices of the list given by `key` that are blank strings.
    '''
    indices = [i for i, v in enumerate(d.get('b')) if v=='']
    for v in d.values():
        for ix in reversed(indices):
            v.pop(ix)
    return d

# test case, drops indices 1 and 4:
d = {'a': [1, 2, 3, 4, 5], 'b': ['10', '', '30', '40', ''], 'c': [0, 0, 1, 1, 2]}

drop_empty(d, 'b')
# returns:
{'a': [1, 3, 4], 'b': ['10', '30', '40'], 'c': [0, 1, 1]}
James
  • 32,991
  • 4
  • 47
  • 70
0
d = {'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']}

bad_inds = [ind for ind in range(len(d['a'])) if not d['a'][ind] or not d['b'][ind]]

for ind in bad_inds:
    for value in d.values():
        del value[ind]

output:

d

>>> {'a': [1, 3, 4], 'b': ['10', '30', '40']}
Brian
  • 1,572
  • 9
  • 18
0

you can first obtain all good indices and then filter your values base on good indices:

from operator import itemgetter

good_indices = [i for i, v in enumerate(zip(*d.values())) if all(v)]
d = {k : [*itemgetter(*good_indices)(v)] for k, v in d.items()}

print(d)

output:

{'a': [1, 3, 4], 'b': ['10', '30', '40']}
kederrac
  • 16,819
  • 6
  • 32
  • 55