-1

I am trying to match and format output regex result. I have a words array e.g:

var resultArray = new List {"new", "new_"}; // notice the word with underscore

But when i try to search a sentence like this:
New Law_Book_with_New_Cover
it does match the with the first word "New" but not the middle one with "New_". here is my code

 if (resultArray.Count > 0)
 {
            string regex = "\\b(?:" + String.Join("|", resultArray.ToArray()) + ")\\b";
            MatchEvaluator myEvaluator = new MatchEvaluator(GetHighlightMarkup);
            return Regex.Replace(result, regex, myEvaluator, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);
 }

    private static string GetHighlightMarkup(Match m)
    {
        return string.Format("<span class=\"focus\">{0}</span>", m.Value);
    }

And yes i did tried escaping the word "\New_" but no luck still. What am i missing ?

Idrees Khan
  • 7,702
  • 18
  • 63
  • 111
  • 2
    `_` is part of the word character class, thus `\b` will not match between underscore and letters – Sebastian Proske Aug 23 '16 at 11:27
  • @SebastianProske what is the work around ? – Idrees Khan Aug 23 '16 at 11:28
  • You could use lookarounds, e.g. `(?<![a-zA-Z])New(?![a-zA-Z])` – Sebastian Proske Aug 23 '16 at 11:29
  • @SebastianProske and the Olympics medal goes to you! post it as an aswer – Idrees Khan Aug 23 '16 at 11:36
  • In addition to @WiktorStribiżew's answer: sorting the `resultArray` in item length descending order should help avoiding the [`New|New_`](https://regex101.com/r/mT7jU5/1) vs [`New_|New`](https://regex101.com/r/sM8iJ1/1) issue, so do `resultArray.Sort((s1, s2) => s2.Length - s1.Length);` before `string regex = ...` – Dmitry Egorov Aug 23 '16 at 11:41
  • Yeah, and to add to Sebastian's comment: I guess OP needs to check for alphanumerics on both sides, not just letters. DotNetDreamer, you have not specified what the result should be: `New Law_Book_with_New_Cover` or `New Law_Book_with_New_Cover`? – Wiktor Stribiżew Aug 23 '16 at 12:02
  • So will you let us answer the question? Why did Sebastian's solution work? It does not find `New_`, it finds `New`. – Wiktor Stribiżew Aug 23 '16 at 15:14
  • @WiktorStribiżew it does. The results for me worked – Idrees Khan Aug 24 '16 at 08:27
  • Why? If you explain, that will make it possible to provide a valid answer that will be helpful to *all readers*, not just for you. "It does not work" or "it works" are not helpful to anyone. At least provide the expected result in the body of the question. – Wiktor Stribiżew Aug 24 '16 at 08:37

1 Answers1

0

It seems you need to match your items only if they are not enclosed with letters.

You may replace the word boundaries in your regex with lookarounds:

string regex = @"(?<!\p{L})(?:" + String.Join("|", resultArray.ToArray()) + @")(?!\p{L})";

where \p{L} matches any letter, (?<!\p{L}) requires the absence of a letter before the match, and (?!\p{L}) disallows a letter after the match.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563