Given a string sequence of words, check if it matches a pattern

Question

I encountered this problem in an interview and I was stuck on the best way to go about it. The question is as follows:

Given a string sequence of words and a string sequence pattern, return true if the sequence of words matches the pattern otherwise false.

Definition of match: A word that is substituted for a variable must always follow that substitution. For example, if "f" is substituted as "monkey" then any time we see another "f" then it must match "monkey" and any time we see "monkey" again it must match "f".

Examples
input: "ant dog cat dog", "a d c d"
output: true
This is true because every variable maps to exactly one word and vice verse.
a -> ant
d -> dog
c -> cat
d -> dog

input: "ant dog cat dog", "a d c e"
output: false
This is false because if we substitute "d" as "dog" then you can not also have "e" be substituted as "dog".
a -> ant
d, e -> dog (Both d and e can't both map to dog so false)
c -> cat

input: "monkey dog eel eel", "e f c c"
output: true
This is true because every variable maps to exactly one word and vice verse.
e -> monkey
f -> dog
c -> eel

Initially, I thought of doing something as follows...

function matchPattern(pattern, stringToMatch) {
  var patternBits = pattern.split(" ");
  var stringBits = stringToMatch.split(" ");
  var dict = {};

  if (patternBits.length < 0 
      || patternBits.length !== stringBits.length) {
    return false;
  }

  for (var i = 0; i < patternBits.length; i++) {
    if (dict.hasOwnProperty(patternBits[i])) {
      if (dict[patternBits[i]] !== stringBits[i]) {
        return false;
      }
    } else {
      dict[patternBits[i]] = stringBits[i];
    }
  }
  return true;
}

var ifMatches = matchPattern("a e c d", "ant dog cat dog");
console.log("Pattern: " + (ifMatches ? "matches!" : "does not match!"));

However, I realized that this won't work and fails example #2 as it erroneously returns true. One way to deal with this issue is to use a bi-directional dictionary or two dictionaries i.e store both {"a": "ant"} and {"ant": "a"} and check both scenarios in the if check. However, that seemed like wasted space. Is there a better way to tackle this problem without using regular expressions?

you definition of match is not clear. Write some explanation of why the first example pass, the second fails. I can guess why but most probably you are not here to hear guesses. Are you just comparing first letters of the words? — Salvador Dali, Mar 23 '16 at 22:13

hilberts_drinking_problem · Answer 1 · 2016-03-23T22:38:08.877

0

For this special case, I assume the pattern refers to matching first character. If so, you can simply zip and compare.

# python2.7

words   = "ant dog cat dog"
letters = "a d c d"
letters2 = "a d c e"

def match(ws, ls):
    ws = ws.split()
    ls = ls.split()
    return all(w[0] == l for w, l in zip(ws + [[0]], ls + [0]))

print match(words, letters)
print match(words, letters2)

The funny [[0]] and [0] in the end is to ensure that the pattern and the words have the same length.

edited Mar 23 '16 at 22:38

answered Mar 23 '16 at 22:15

hilberts_drinking_problem

11,322
3
22
51

1

It might be better to use an argument-less split, e.g. `split()`, otherwise an input like 'ant(double space)dog cat' raise an `IndexError` because of the double-space. – Jared Goguen Mar 23 '16 at 22:18
I'm not a Python expert, so those `+ [[0]]` and `+ [0]` look quite funny indeed. They don't appear to change the difference in length between `ws` and `ls`, so the comment that they "ensure that the pattern and the words have the same length" is very surprising. Can you expand on why they are useful? What is an example of an input that works with those additions but not without? – Daniel Wagner Mar 24 '16 at 00:47
You know how when you implement merge sort, you can use a trick and add '+infinity' in the end of each list? This is similar. `zip` iterates up to the length of the shortest list, so 'apple', 'a b' would match unless you check for length. But `0` can never be equal to a string, so `'b' == 0` fails as wanted. This is not terribly useful here, apart from saving space; however, if inputs were lazy, this could be much more efficient when lengths of patterns and strings differ. – hilberts_drinking_problem Mar 24 '16 at 02:03
I apologize, I was not as clear with the question as I should of been but I do not mean matching the first character. I updated the question to be more clear. Basically, my definition of match is that if I ever substitute a variable with a word such as if I ever substitute "g" as dog. Then, if I ever see the variable "g" then the word must be "dog" or if I ever see the word "dog" then the variable must be "g". – skalidindi Mar 24 '16 at 04:30
1

@YakymPirozhenko Oh. Ick. Wouldn't `return len(ws) == len(ls) and all(w[0] == l for w, l in zip(ws, ls))` be more readable, and more robust to objects that support `split` but can contain numbers? – Daniel Wagner Mar 24 '16 at 04:51
Yes, you are right, and in this case it is probably best. However, if you worked with something like Haskell, you could use lazy `splitOn`, and compare the 2 lists lazily. This way, you do not even need to split the list all the way if the mismatch occurs early on. The price you pay is that you do not know the lengths of the lists in advance. But you can work around that as above, and avoid checking for empty lists in advance. There is probably time and place for one approach or the other. – hilberts_drinking_problem Mar 24 '16 at 13:29

score 0 · Accepted Answer · edited May 23 '17 at 11:45

I think a simple choice that is quadratic in the length of the list of words is to verify that every pairing of list indices has the same equality characteristics in the two lists. I'll assume that you get the "words" and "pattern" as lists already and don't need to parse out spaces and whatever -- that ought to be a separate function's responsibility anyway.

function matchesPatternReference(words, pattern) {
    if(words.length !== pattern.length) return false;
    for(var i = 0; i < words.length; i++)
        for(var j = i+1; j < words.length; j++)
            if((words[i] === words[j]) !== (pattern[i] === pattern[j]))
                return false;
    return true;
}

A slightly better approach would be to normalize both lists, then compare the normalized lists for equality. To normalize a list, replace each list element by the number of unique list elements that appear before its first occurrence in the list. This will be linear in the length of the longer list, assuming you believe hash lookups and list appends take constant time. I don't know enough Javascript to know if these are warranted; certainly at worst the idea behind this algorithm can be implemented with suitable data structures in n*log(n) time even without believing that hash lookups are constant time (a somewhat questionable assumption no matter the language).

function normalize(words) {
    var next_id = 0;
    var ids = {};
    var result = [];
    for(var i = 0; i < words.length; i++) {
        if(!ids.hasOwnProperty(words[i])) {
            ids[words[i]] = next_id;
            next_id += 1;
        }
        result.push(ids[words[i]]);
    }
    return result;
}

function matchesPatternFast(words, pattern) {
    return normalize(words) === normalize(pattern);
}

Note: As pointed out in the comments, one should check deep equality of the normalized arrays manually, since === on arrays does an identity comparison in Javascript and does not compare elementwise. See also How to check if two arrays are equal with Javascript?.

Addendum: Below I argue that matchesPatternFast and matchesPatternReference compute the same function -- but use the faulty assumption that === on arrays compares elements pointwise rather than being a pointer comparison.

We can define the following function:

function matchesPatternSlow(words, pattern) {
    return matchesPatternReference(normalize(words), normalize(pattern));
}

I observe that normalize(x).length === x.length and normalize(x)[i] === normalize(x)[j] if and only if x[i] === x[j]; therefore matchesPatternSlow computes the same function as matchesPatternReference.

I will now argue that matchesPatternSlow(x,y) === matchesPatternFast(x,y). Certainly if normalize(x) === normalize(y) then we will have this property. matchesPatternFast will manifestly return true. On the other hand, matchesPatternSlow operates by making a number of queries on its two inputs and verifying that these queries always return the same results for both lists: outside the loop, the query is function(x) { return x.length }, and inside the loop, the query is function(x, i, j) { return x[i] === x[j]; }. Since equal objects will respond identically to any query, it follows that all queries on the two normalized lists will align, matchesPatternSlow will also return true.

What if normalize(x) !== normalize(y)? Then matchesPatternFast will manifestly return false. But if they are not equal, then either their lengths do not match -- in which case matchesPatternSlow will also return false from the first check in matchesPatternReference as we hoped -- or else the elements at some index are not equal. Suppose the smallest mismatching index is i. It is a property of normalize that the element at index i will either be equal to an element at index j<i or else it will be one larger than the maximal element from indices 0 through i-1. So we now have four cases to consider:

We have j1<i and j2<i for which normalize(x)[j1] === normalize(x)[i] and normalize(y)[j2] === normalize(y)[i]. But since normalize(x)[i] !== normalize(y)[i] we then know that normalize(x)[j1] !== normalize(y)[i]. So when matchesPatternReference chooses the indices j1 and i, we will find that normalize(x)[j1] === normalize(x)[i] is true and normalize(y)[j1] === normalize(y)[i] is false and immediately return false as we are trying to show.
We have j<i for which normalize(x)[j] === normalize(x)[i] and normalize(y)[i] is not equal to any previous element of normalize(y). Then matchesPatternReference will return false when it chooses the indices j and i, since normalize(x) matches on these indices but normalize(y) doesn't.
We have j<i for which normalize(y)[j] === normalize(y)[i] and normalize(x)[i] is not equal to any previous element of normalize(x). Basically the same as in the previous case.
We have that normalize(x)[i] is one larger than the largest earlier element in normalize(x) and normalize(y)[i] is one larger than the largest earlier element in normalize(y). But since normalize(x) and normalize(y) agree on all previous elements, this means normalize(x)[i] === normalize(y)[i], a contradiction to our assumption that the normalized lists differ at this index.

So in all cases, matchesPatternFast and matchesPatternSlow agree -- hence matchesPatternFast and matchesPatternReference compute the same function.

Interesting idea of normalizing the list. I believe this works but comparing two arrays directly wont work (i.e normalize(words) === normalize(pattern)) but I get the general idea. — skalidindi, Mar 24 '16 at 08:00
@skalidindi Can you say what "won't work" means? What problem are you worried about? What input do you believe this will give the wrong answer for? — Daniel Wagner, Mar 24 '16 at 08:35
@skalidindi I have added an edit in which I prove that the normalization scheme I proposed works -- at least as well as the reference implementation does. — Daniel Wagner, Mar 24 '16 at 09:25
I am saying that in javascript you can not compare arrays like that. For example, if you open the chrome dev tools and do ([0,1] === [0,1]) you will get false. Notice, that the elements and the length of the array are the same but the comparison still returns false. Even, the more lax == will still return false. You have to manually loop through the arrays and check the elements individually. However, minus that fact, the normalization scheme does work. — skalidindi, Mar 24 '16 at 17:36
Also, I think you can do this in python for example with lists but even that under the hood, I think the == operator is checking element by element. See https://docs.python.org/2/tutorial/datastructures.html#comparing-sequences-and-other-types. Section 5.8. — skalidindi, Mar 24 '16 at 17:51
@skalidindi Aha, that's an annoying fact about Javascript that I did not know. I'll make a small edit to the post to mention it. — Daniel Wagner, Mar 24 '16 at 18:07

Given a string sequence of words, check if it matches a pattern

2 Answers2