2

For an application I'm working on, I'm searching a directory of files, and expecting to find matching pairs of files to perform some further analysis on.

In this case, a pair is defined as matching on some subset of attributes, but differing in some other attributes.

As part of the error handling/warning, I want to identify any files found that are "incomparable," i.e. files for which the expected "partner" in the pair is not found.

I have a class of objects to store the structured attribute information, and when I read files in the directory, I store each file I find as an element in list of these objects.

Here's a silly simple example

class glove(object):
    def __init__(self, size, color, is_right):
        self.size = size
        self.color = color
        self.is_right = is_right

    def __repr__(self):
        if self.is_right:
            hand = "right"
        else:
            hand = "left"
        s = "{} {} {}".format(self.size, self.color, hand)
        return(s)


gloves = [glove('med', 'black', False),
          glove('med', 'black', True),
          glove('lg', 'black', False),
          glove('lg', 'black', True),
          glove('med', 'brown', False),
          glove('med', 'brown', True),
          glove('lg', 'blue', False),
          glove('med', 'tan', False)]

left_gloves = [x for x in gloves if not x.is_right]
right_gloves = [x for x in gloves if x.is_right]

Let's assume that there's no duplicate elements in the list, and let's define a "pair" as two glove objects that have matching glove.size and glove.color but different values of glove.is_right (i.e. one is Right and one is Left).

Now I'd like to identify incomplete pairs (perhaps into a list of leftovers so that I could error or warn appropriately, e.g. "No Left lg blue glove found" "No Left med tan glove found."

I've seen answers that teach how to identify items "missing" from pairs of lists, but my application has a couple of complexities that I couldn't figure out how to address: linking on attributes of an object, and linking on multiple attributes of an object.

I imagine something is possible with for loops and list comprehension, but I can't quite figure out how to link it all together.

mac
  • 3,137
  • 1
  • 28
  • 42
  • Are the items unique? Can you have more than one "left med tan" glove, for instance? – Prune Mar 26 '18 at 17:20
  • @chrisz I only know I have a missing glove if I find one that is missing a partner. In the above example, per this definition, there are only two known missing gloves. – mac Mar 26 '18 at 17:25
  • @Prune we're assuming no duplicates are allowed, and that there's proper protections in place to guarantee this. – mac Mar 26 '18 at 17:26

2 Answers2

2

It's pretty easy if you can implement equality/hash for your class:

class glove(object):
    def __init__(self, size, color, is_right):
        self.size = size
        self.color = color
        self.is_right = is_right

    def __repr__(self):
        if self.is_right:
            hand = "right"
        else:
            hand = "left"
        s = "{} {} {}".format(self.size, self.color, hand)
        return(s)

    def __eq__(self, other):
        return isinstance(other, glove) and \
            other.size == self.size and \
            other.color == self.color \
            and other.is_right == self.is_right

    def __hash__(self):
        return hash((self.size, self.color, self.is_right))


gloves = [glove('med', 'black', False),
          glove('med', 'black', True),
          glove('lg', 'black', False),
          glove('lg', 'black', True),
          glove('med', 'brown', False),
          glove('med', 'brown', True),
          glove('lg', 'blue', False),
          glove('med', 'tan', False)]

gloves_set = set(gloves)
unpaired = [g for g in gloves if glove(g.size, g.color, not g.is_right) not in gloves_set]
print(unpaired)

Output:

[lg blue left, med tan left]

You can also consider using namedtuple, which actually does these for you.


Here is an alternative that does not require implementing equals and hash, nor creating new objects:

class glove(object):
    def __init__(self, size, color, is_right):
        self.size = size
        self.color = color
        self.is_right = is_right

    def __repr__(self):
        if self.is_right:
            hand = "right"
        else:
            hand = "left"
        s = "{} {} {}".format(self.size, self.color, hand)
        return(s)


gloves = [glove('med', 'black', False),
          glove('med', 'black', True),
          glove('lg', 'black', False),
          glove('lg', 'black', True),
          glove('med', 'brown', False),
          glove('med', 'brown', True),
          glove('lg', 'blue', False),
          glove('med', 'tan', False)]

# With plain dict
glove_search = {}
for g in gloves:
    glove_search.setdefault(g.size, {}).setdefault(g.color, {})[g.is_right] = True
unpaired = [g for g in gloves
            if not glove_search.get(g.size, {}).get(g.color, {}).get(not g.is_right, False)]

# Or, more idiomatically, with defaultdict
from collections import defaultdict
glove_search = defaultdict(lambda: defaultdict(lambda: defaultdict(bool)))
for g in gloves:
    glove_search[g.size][g.color][g.is_right] = True
unpaired = [g for g in gloves if not glove_search[g.size][g.color][not g.is_right]]

print(unpaired)

Output:

[lg blue left, med tan left]
jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • whoa, this looks really elegant, and introduces some new-to-me concepts that I need to do some reading on – mac Mar 26 '18 at 17:29
  • I just caught that this solution relies on creating a new glove object in the list comprehension line as part of the comparison. Any way to make this approach work if the constructor for glove doesn't actually support creating a glove object by attributes? (my real-life example constructs glove objects by parsing a file) – mac Mar 26 '18 at 18:47
  • @mac Yes, this approach requires having the looked-for object, either created there or through some `makePair` method or similar. I can try to think of an alternative later... – jdehesa Mar 26 '18 at 20:25
  • 1
    @mac I've added another alternative that does not require creating objects. – jdehesa Mar 26 '18 at 20:32
  • Awesome. That will work well for my application. I also appreciate the original answer, I know that will come in handy some day. – mac Mar 26 '18 at 21:22
  • @mac I've added an equivalent but more idiomatic version using [`defaultdict`](https://docs.python.org/3/library/collections.html#defaultdict-objects). – jdehesa Mar 27 '18 at 08:56
2

With no duplicates allowed, the problem is relatively simple. Concatenate your identifiers:

self.ID = self.size + " " + self.color

Build left/right subsets on the ID's alone.

left  = {g.ID for g in gloves if not g.is_right)
right = {g.ID for g in gloves if     g.is_right)

unmatched_left  = left - right
unmatched_right = right - left

Now, simply reverse the key process to get the glove objects:

unmatched = [g for g in glove_set \
             if g.ID in unmatched_left + unmatched_right]
Prune
  • 76,765
  • 14
  • 60
  • 81