Matching incompletely qualified paths

Question

Let's say I have a series of incompletely qualified paths, which are missing some parts, but are guaranteed to have two properties:

The final part of both the incompletely and the fully qualified paths will be exactly equal, and
The order of each part of the incompletely qualified path will match the actual order of the parts of the fully qualified path.

For example,

p1 = '/foo/baz/myfile.txt'
p2 = '/bar/foo/myfile.txt'
actual = '/foo/bar/baz/myfile.txt'

In this case, p1 will match, but p2 will not, because in the actual path, bar occurs after foo. Easy enough: [actual.split('/').index(part) for part in p1.split('/')] will be an ordered list, but the same comprehension but for p2 will not.

But what happens if there are repetitions in the path?

p1 = '/foo/bar/bar/myfile.txt'
p2 = '/bar/bar/baz/myfile.txt'
actual = '/foo/bar/baz/bar/myfile.txt'

How can I identify that p1 does match, but p2 does not (because, although baz occurs after the first bar, it does not occur after the second?

Ashwini Chaudhary · Answer 1 · 2013-09-24T19:12:24.450

Method 1: Using list.index:

def match(strs, actual):
    seen = {}
    act = actual.split('/')
    for x in strs.split('/'):
        if x in seen:
            #if the item was already seen, so start search 
            #after the previous matched index
            ind = act.index(x, seen[x]+1)
            yield ind
            seen[x] = ind
        else:
            ind = act.index(x)
            yield ind
            seen[x] = ind
...             
>>> p1 = '/foo/baz/myfile.txt'
>>> p2 = '/bar/foo/myfile.txt'
>>> actual = '/foo/bar/baz/myfile.txt'
>>> list(match(p1, actual))     #ordered list, so matched
[0, 1, 3, 4]
>>> list(match(p2, actual))     #unordered list, not matched
[0, 2, 1, 4]

>>> p1 = '/foo/bar/bar/myfile.txt'
>>> p2 = '/bar/bar/baz/myfile.txt'
>>> actual = '/foo/bar/baz/bar/myfile.txt'
>>> list(match(p1, actual))     #ordered list, so matched
[0, 1, 2, 4, 5]
>>> list(match(p2, actual))     #unordered list, not matched
[0, 2, 4, 3, 5]

Method 2: Using defaultdict and deque:

from collections import defaultdict, deque
def match(strs, actual):
    indexes_act = defaultdict(deque)
    for i, k in enumerate(actual.split('/')):
        indexes_act[k].append(i)
    prev = float('-inf')
    for item in strs.split('/'):
        ind = indexes_act[item][0]
        indexes_act[item].popleft()
        if ind > prev:
            yield ind
        else:
            raise ValueError("Invalid string")
        prev = ind

Demo:

>>> p1 = '/foo/baz/myfile.txt'
>>> p2 = '/bar/foo/myfile.txt'
>>> actual = '/foo/bar/baz/myfile.txt'
>>> list(match(p1, actual))
[0, 1, 3, 4]
>>> list(match(p2, actual))
    ...
    raise ValueError("Invalid string")
ValueError: Invalid string

>>> p1 = '/foo/bar/bar/myfile.txt'
>>> p2 = '/bar/bar/baz/myfile.txt'
>>> actual = '/foo/bar/baz/bar/myfile.txt'
>>> list(match(p1, actual))
[0, 1, 2, 4, 5]
>>> list(match(p2, actual))
    ...
    raise ValueError("Invalid string")
ValueError: Invalid string

Thanks: I wasn't aware I could pass an offset to `.index()`. — 2rs2ts, Sep 24 '13 at 18:26

unutbu · Accepted Answer · 2013-09-24T17:56:44.130

1

def match(path, actual):
    path = path.strip('/').split('/')
    actual = iter(actual.strip('/').split('/'))
    for pathitem in path:
        for item in actual:
            if pathitem == item:
                break
        else:
            # The for-loop never breaked, so pathitem was never found
            return False
    return True

q1 = '/foo/baz/myfile.txt'
q2 = '/bar/foo/myfile.txt'
p1 = '/foo/bar/bar/myfile.txt'
p2 = '/bar/bar/baz/myfile.txt'
actual = '/foo/bar/baz/bar/myfile.txt'

print(match(q1, actual))
# True

print(match(q2, actual))
# False

print(match(p1, actual))
# True

print(match(p2, actual))
# False

edited Sep 24 '13 at 17:56

answered Sep 24 '13 at 17:45

unutbu

842,883
184
1,785
1,677

I found this approach simple. I had to reappropriate it for my purposes (iterating over large sets of candidate `actual`s) but it worked. – 2rs2ts Sep 24 '13 at 18:28

Matching incompletely qualified paths

2 Answers2