0

I've got some code that reads files line by line. It needs to match each line for tags that have the following names:

/root|classcod|date|year|agency|office|popaddress|location|zip|naics|contact/

My code builds a tree of tags from the data on a page, and then maps through the tree to compare the node names with the list above. I need to match the name with one of those exactly, or exclude it entirely.

The issue I'm having is that when one of the tag names has a portion of any of the names in the list, that name is added. For example:

respdate
date

the code includes the tag for 'respdate' as well as 'date'. How do I make the regex exclude respdate entirely since it doesn't match "date" exactly?

Nick Res
  • 2,154
  • 5
  • 30
  • 48

1 Answers1

1

One option is to use the word boundaries metacharacter, \b.

Group all of your words in a non-capturing group and surround the group with word boundaries on both sides:

\b(?:root|date|year)\b
Josh Crozier
  • 233,099
  • 56
  • 391
  • 304