I am making an interface for searching a fairly limited dictionary of words (2700 or so entries). The words are stored in an XML file thus:
<root>
<w>aunt</w>
<w>active volcano</w>
<w>Xamoschi</w>
</root>
It is fairly basic - the user enters a string, and any matches are spit back out. The problem came when I wanted to include a wildcard character. If a user enters a string with asterisks, each asterisk is replaced by a regex to match zero or more characters, which can be anything.
So, when the user hits search, the script cycles through the XML tags and matches each nodeValue
against the pattern srch
:
var wildcardified = userinput.replace(/\*/g, ".*?");
var srch = new RegExp(wildcardified, "gi");
//for loop cycles through the xml, and tests with this:
if (srch.test(tag[i].firstChild.nodeValue) {
//it's a match!
}
For the most part, it works as I'd hoped. But I'm getting some inconsistent results that I can't explain. For the values in the XML tags above, this is what happens with various inputs:
a*
matches all threea*n
matches aunt and active volcanoa*t
only matches aunta*ti
only matches active volcano
Should #3 not also match the act in active volcano?
I see the same kind of results with other similar sets of words. I've tried to isolate the specific issue, but I can't for the life of me figure out what it is.
The Question: Can someone explain why #3 is not returning "active volcano", and what I can do to fix such behaviour?
Incidentally, I want it to be non-greedy, but just in case that was the issue, I tested both with and without the ?
. Both returned the same inconsistent results above.