3

I have a default dict d in python which contains two list in it as below:

{
    'data1': [0.8409093126477928, 0.9609093126477928, 0.642217399079215, 0.577003839123445, 0.7024399719949195, 1.0739533732043967], 
    'data2':  [0.9662666242560285, 0.9235637581239243, 0.8947656867577896, 0.9266919525550584, 1.0220039913024457]
}

In future there can be many list in in default dict like data1, data2, data3, data4 etc. I need to compare the index values of default dict with each other. So for above default dict I need to check weather data1[0]->0.8409093126477928 is smaller than data2[0]->0.9662666242560285 or not and same goes for other index, and store the result of wining list index in separate list like below:

result = ['data1', 'data2', 'data1', 'data1', 'data1']

If length of any list is greater than other list, we simply need to check if the last index value is smaller than 1 or not. Like data1[5] cannot be compared with data2[5] because there is no value of data2[5] thus we will simply check if data1[5] is less than 1 or not. If its less than 1 then we will consider it and add it to result otherwise ignore it and will not save it in result.

To resolve this I thought, of extracting the list from default dict to separate list and then using a for loop to compare index values, but when I did print(d[0]) to print the 0th index list, it printed out []. Why is it printing null. How can I compare the index values as above. Please help. Thanks

S Andrew
  • 5,592
  • 27
  • 115
  • 237
  • This may help with your problem: [How to index into a dictionary?](https://stackoverflow.com/questions/4326658/how-to-index-into-a-dictionary) – SacrificerXY Sep 29 '19 at 02:53
  • 1
    I could use a bit of clarification on what to do when there are lists longer than others. Do we do the "less than 1" thing only if there's a single list left, or if any list is missing? If the latter, which one of the remaining lists do we take? – ggorlen Sep 29 '19 at 02:57
  • 1
    What happens when they're equal? – SacrificerXY Sep 29 '19 at 03:49

3 Answers3

1

We can use zip_longest from itertools and a variety of loops to achieve the result:

from itertools import zip_longest

result = []
pairs = [[[z, y] for z in x] for y, x in data.items()]

for x in zip_longest(*pairs):
    x = [y for y in x if y]

    if len(x) > 1:
        result.append(min(x, key=lambda x: x[0])[1])
    elif x[0][0] < 1:
        result.append(x[0][1])

print(result) # => ['data1', 'data2', 'data1', 'data1', 'data1']

First we create pairs of every item in each dict value and its key. This makes it easier to get result keys later. We zip_longest and iterate over the lists, filtering out Nones. If we have more than one element to compare, we take the min and append it to the result, else we check the lone element and keep it if its value is less than 1.

A more verifiable example is

data = {
    'foo':  [1, 0, 1, 0], 
    'bar':  [1, 1, 1, 1, 0],
    'baz':  [1, 1, 0, 0, 1, 1, 0],
    'quux': [0],
}

which produces

['quux', 'foo', 'baz', 'foo', 'bar', 'baz']

Element-wise, "quux" wins round 0, "foo" wins round 1, "baz" 2, "foo" round 3 thanks to key order (tied with "baz"), "bar" for round 4. For round 5, "baz" is the last one standing but isn't below 1, so nothing is taken. For round 6, "baz" is still the last one standing but since 0 < 1, it's taken.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
1

Edit: as suggested by @ggorlen replaced the custom iterator with zip_longest

I would do it using custom_iterator like this,

  • zip longest yeild one item from each array in each iteration. for shorter array it will return 1 when iteration goes past its length
  • The list comprehension loop through the iterator and get 1st index of min item item.index(min(item)) then get the key corresponding to the min value keys[item.index(min(item))]
  • if selected list is shorter than current iterator index it either skips or give "NA" value
from itertools import zip_longest

keys = list(d.keys())
lengths = list(map(len,d.values()))

result = [keys[item.index(min(item))] 
          for i, item in enumerate(zip_longest(*d.values(), fillvalue=1))
          if lengths[item.index(min(item))]>i]

result

if you want to give default key instead of skip-ing when minimum value found is not less than one

result = [keys[item.index(min(item))] if lengths[item.index(min(item))]>i else "NA"
          for i, item in enumerate(zip_longest(*d.values(), fillvalue=1))]
Dev Khadka
  • 5,142
  • 4
  • 19
  • 33
  • Thankyou. Can you explain this logic please.? – S Andrew Sep 29 '19 at 03:20
  • I'm getting different results running this on the example in my post: `['quux', 'foo', 'baz', 'foo', 'bar', 'foo', 'baz']`. Not sure how that last `"foo"` gets in there. I think using `col.append(arr[i] if i< len(arr) else 1)` for padding breaks the logic because the default placeholder value of 1 can inadvertently be interpreted as a minimum value and generate false positives, but I'm not 100% sure. – ggorlen Sep 29 '19 at 03:24
  • @DevKhadka I am getting the correct results `['data1', 'data2', 'data1', 'data1', 'data1', 'data2']` but not sure why its appending `data2` as last item in `result`. Also I am still trying to understand this logic, can you explain where exactly you are appending the `data1` or `data2` in result list please.? – S Andrew Sep 29 '19 at 03:31
  • @SAndrew can you comment on the correctness of my response? I'm pretty curious if my example matches what you're asking for. If so, it should work for you. – ggorlen Sep 29 '19 at 03:36
  • The iterator is giving default value 1 for shorter list, so when iterator goes past the shorter list and other list have value greater than 1 it will choose key of 1st list – Dev Khadka Sep 29 '19 at 03:36
  • I don't think using 1 as a default is going to work. It should be something like `None` so you can identify the difference between a placeholder and an actual list with a 1 in it. – ggorlen Sep 29 '19 at 03:37
  • ya, you have a point, I am using 1 because @SAndrew want to take values less than one. I will work if other list have value less than one but if not then it will return key of 1st shorter list. I think we can return some value indicating "NA" in that case – Dev Khadka Sep 29 '19 at 03:46
  • Yeah, but all of that is essentially re-writing `zip_longest`, right? – ggorlen Sep 29 '19 at 03:47
  • ya @ggorlen is right, I have modified answer using zip_longest – Dev Khadka Sep 29 '19 at 04:31
1
d = {
    'd0': [0.1, 1.1, 0.3],
    'd1': [0.4, 0.5, 1.4, 0.3, 1.6],
    'd2': [],
}

import itertools
import collections

# sort by length of lists, shortest first and longest last
d = sorted(d.items(), key=lambda k:len(k[1]))

# loop through all combinations possible
for (key1, list1), (key2, list2) in itertools.combinations(d, 2):
    result = []
    for v1, v2 in itertools.zip_longest(list1, list2): # shorter list is padded with None
        # no need to check if v2 is None because of sorting
        if v1 is None:
            result.append(key2 if v2 < 1 else None)
        else:
            result.append(key1 if v1 < v2 else key2)

    # DO stuff with result, keys, list, etc...
    print(f'{key1} vs {key2} = {result}')

Output

d2 vs d0 = ['d0', None, 'd0']
d2 vs d1 = ['d1', 'd1', None, 'd1', None]
d0 vs d1 = ['d0', 'd1', 'd0', 'd1', None]

I sorted them based on the list lengths. This ensures that list1 will always be shorter or of the same length as list2.

For different lengths, the remaining indices will be a mixture of None and key2.

However, when the elements are equal, key2 is added to the result. This might not be the desired behavior.

SacrificerXY
  • 324
  • 1
  • 9