How can I delete the same columns(same indexed elements) from a python dictionary?

Question

Let's say I have this:

d = {'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']}

And I'd like this:

d = {'a': [1, 3, 4], 'b': ['10', '30', '40']}

If I see an empty element in b, I'd like to delete it, which is d["b"][1] and the same time delete d["a"][1] at the same index.

EDIT: Forget to mention, that you must not change the order of any elements.

Just two keys in the dictionary or do you want to generalize this? — pault, Sep 11 '19 at 19:28
Can you explain a bit more? Are you always looking at `b` for the bad values or can there be others? — pault, Sep 11 '19 at 19:32

score 4 · Answer 1 · answered Sep 11 '19 at 19:37

4

Here's an idea. Looks like you're treating your dictionary as a data frame, since you're "connecting" your lists by index.

So why not just use a library and do your operations in a clean way and efficently?

import pandas as pd
df = pd.DataFrame(d)

Yields

Then

df[~df.eq('').any(1)]

   a   b
0  1  10
2  3  30
3  4  40

After all manipulation, if you need your dictionary back:

df.to_dict('list')

{'a': [1, 3, 4], 'b': ['10', '30', '40']}

answered Sep 11 '19 at 19:37

rafaelc

57,686
15
58
82

1

This is probably the best answer. The available libraries in Python are there to make you life easier, might as well use them. – James Sep 11 '19 at 20:51
Just two questions. 1) what is the ~ in the df[~df.eq('').any(1)] 2) I tried this and works perfectly but I got a "FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison result = method(y)" (Pandas 0.25.1) – poocha316 Sep 12 '19 at 05:34
@NorbertTóth that's fine! That's not a problem with pandas, but with numpy. It happens when you compare different types. Not much to worry about, but for details [take a look here](https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur) – rafaelc Sep 12 '19 at 13:48
@NorbertTóth the `~` operates as a logical "NOT" operator for pandas series. – pault Sep 16 '19 at 20:50

Jean-François Fabre · Answer 2 · 2019-09-11T19:54:15.830

2

A general solution:

find which indices are blanks and put them in an unique list, sorted in reversed order
loop in the values and remove the indices

The decreasing order ensures that if there are several blanks the proper elements are removed.

d = {'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']}

empty_indexes = sorted({i for v in d.values() for i,x in enumerate(v) if not x},reverse=True)

for v in d.values():
    for i in empty_indexes:
        try:
            v.pop(i)
        except IndexError:
            pass

A oneliner (inspired by pault in comments):

dict(zip(d,[list(y) for y in zip(*(x for x in zip(*d.values()) if all(i!="" for i in x)))]))

decrypting this:

the inner zip transposes the values.
a generator comprehension filters the rows where all elements are non-empty (if all(...)
the middle zip transposes back into the original orientation
zipping keys & values rebuilds the dictionary. There's no order issue as keys are guaranteed to be ordered the same as values, no matter the version of python.

The oneliner is hard to read, and can be decomposed in loops. It doesn't require sort+unicity of the indices. In fact, it doesn't need indices at all.

One-liner free:

values = []  # init list of values
for y in zip(*d.values()):   # loop on assembled values
    if all(i != "" for i in y):  # filter out rows which contain empty strings
        values.append(y)

# transpose back / convert to list (since zip yields tuples)
values = [list(x) for x in zip(*values)]

# rebuild dictionary. Order of d and values is the same
d = dict(zip(d,values))

edited Sep 11 '19 at 19:54

answered Sep 11 '19 at 19:32

Jean-François Fabre

137,073
23
153
219

probably, but that's what I imagined. – Jean-François Fabre Sep 11 '19 at 19:34
@Jean-FrançoisFabre how about `dict(zip(d.keys(), map(list, zip(*filter(lambda x: x[1], zip(*d.values()))))))` – pault Sep 11 '19 at 19:34
is that a general solution? I'd drop the "d.keys()" for "d" anyway. – Jean-François Fabre Sep 11 '19 at 19:35
Actually what I posted has problems, but I think there's a `zip` based a approach ... possibly only for 3.6+ where dicts are insertion ordered. – pault Sep 11 '19 at 19:37
Why not empty_indexes = sorted([i for v in d.values() for i,x in enumerate(v) if not x],reverse=True) instead of empty_indexes = sorted({i for v in d.values() for i,x in enumerate(v) if not x},reverse=True) – PapaDiHatti Sep 11 '19 at 19:38
zip + any to check if any value of the zipped row is empty. If it's the case, drop it. That should work. – Jean-François Fabre Sep 11 '19 at 19:38
I'd prefer the expanded solution 1000 times over the oneliner - not readable at all ;} – rafaelc Sep 11 '19 at 19:51
@rafaelc yes agreed, I didn't intend to sidetrack the solution to a one-liner- my initial point was that I didn't think sorting was necessary. – pault Sep 11 '19 at 19:53
2

provided non-oneliner too. pault probably prefered pasting a oneliner as a comment rather than a full python code. You should have answered, though – Jean-François Fabre Sep 11 '19 at 19:54

score 1 · Accepted Answer · answered Sep 11 '19 at 19:34

1

As a general solution, assuming each list is of the same size, you can use:

def drop_empty(d, key):
    '''
    Drops values from all lists in the dictionary `d` at the
    indices of the list given by `key` that are blank strings.
    '''
    indices = [i for i, v in enumerate(d.get('b')) if v=='']
    for v in d.values():
        for ix in reversed(indices):
            v.pop(ix)
    return d

# test case, drops indices 1 and 4:
d = {'a': [1, 2, 3, 4, 5], 'b': ['10', '', '30', '40', ''], 'c': [0, 0, 1, 1, 2]}

drop_empty(d, 'b')
# returns:
{'a': [1, 3, 4], 'b': ['10', '30', '40'], 'c': [0, 1, 1]}

answered Sep 11 '19 at 19:34

James

32,991
4
47
70

Why reverse on indices ? – PapaDiHatti Sep 11 '19 at 19:48
1

It's always dangerous to change an iterable while iterating it.. Staring from the end reduces chances of `IndexError` – rafaelc Sep 11 '19 at 19:50
but that's not the point here. The code isn't removing while iterating, but if you don't reverse, the first index is okay, but all the following ones are shifted. – Jean-François Fabre Sep 11 '19 at 19:56

score 0 · Answer 4 · answered Sep 11 '19 at 19:36

d = {'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']}

bad_inds = [ind for ind in range(len(d['a'])) if not d['a'][ind] or not d['b'][ind]]

for ind in bad_inds:
    for value in d.values():
        del value[ind]

output:

d

>>> {'a': [1, 3, 4], 'b': ['10', '30', '40']}

kederrac · Answer 5 · 2019-09-11T20:41:01.990

0

you can first obtain all good indices and then filter your values base on good indices:

from operator import itemgetter

good_indices = [i for i, v in enumerate(zip(*d.values())) if all(v)]
d = {k : [*itemgetter(*good_indices)(v)] for k, v in d.items()}

print(d)

output:

{'a': [1, 3, 4], 'b': ['10', '30', '40']}

edited Sep 11 '19 at 20:41

answered Sep 11 '19 at 20:35

kederrac

16,819
6
32
55

How can I delete the same columns(same indexed elements) from a python dictionary?

5 Answers5