9

I am seeking a way to search a string for an exact match or whole word match. RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be.
Consider the following scenario:

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
            int indx = str.IndexOf("TOTAL");
            string amount = str.Substring(indx + "TOTAL".Length, 10);
            string strAmount = Regex.Replace(amount, "[^.0-9]", "");

            Console.WriteLine(strAmount);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }
}

The output of the above code is:

// 34.37
// Press any key to continue...

The problem is, I don't want SUBTOTAL, but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.

So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.

Any advice would be appreciated.

Milad Rashidi
  • 1,296
  • 4
  • 22
  • 40
D J
  • 243
  • 1
  • 2
  • 12

6 Answers6

12

You can use Regex

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = Regex.Match(str, @"\WTOTAL\W").Index; // will be 18
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Thanks! That's much cleaner! Who knew there was a ".Index" hanging off of RegEx.Match? :) :) :) – D J Jun 26 '14 at 18:26
  • A bit ago, there was a post on this answer using a RegEx pattern that returned the number following the exact match for "TOTAL". Did anyone else see it? Anyone care to weigh in on such a pattern? – D J Jun 26 '14 at 18:50
  • @DJ Are you looking for something like `var val = Regex.Match(str, @"\WTOTAL\W\s*([0-9\.]+)").Groups[1].Value;` – L.B Jun 26 '14 at 19:01
  • 1
    WOW! I have got to learn more about RegEx. It seems very powerful, if not very intuitive. Thanks LB! – D J Jun 26 '14 at 19:22
6

My method is faster than the accepted answer because it does not use Regex.

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");

public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}
palota
  • 465
  • 4
  • 8
  • This is also more flexible as it returns -1 if TOTAL is NOT in the line. The Regex above returns 0. – brenth Aug 14 '19 at 19:06
3

You can use word boundaries, \b, and the Match.Index property:

var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, @"\bTOTAL\b").Index;
// => 19

See the C# demo.

The \bTOTAL\b matches TOTAL when it is not enclosed with any other letters, digits or underscores.

If you need to count a word as a whole word if it is enclosed with underscores, use

var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index;

where (?<![^\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.

If the boundaries are whitespaces or start/end of string use

var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index;

where (?<!\S) requires start of string or a whitespace immediately on the left, and (?!\S) requires the end of string or a whitespace on the right.

NOTE: \b, (?<!...) and (?!...) are non-consuming patterns, that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched):

string pattern = String.Format(@"\b{0}\b", findTxt);
Match mtc = Regex.Match(queryTxt, pattern);
if (mtc.Success)
{
    return mtc.Index;
}
else
    return -1;
סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68
0

While this may be a hack that just works for only your example, try

string amount = str.Substring(indx + " TOTAL".Length, 10);

giving an extra space before total. As this will not occur with SUBTOTAL, it should skip over the word you don't want and just look for an isolated TOTAL.

krodmannix
  • 845
  • 10
  • 30
  • LOL!!! Why didn't I see that! It is a bit "hacky" but for my example only, it should work. I would really like to see if there is a way to force the whole word match in a more clean approach, but will mark this as the answer if I don't see a more refined answer in a day or so. THANKS MUCH!!! :) – D J Jun 26 '14 at 18:09
0

I'd recommend the Regex solution from L.B. too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL"). Assuming the TOTAL always comes after SUBTOTAL?

http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

Khôi
  • 2,133
  • 11
  • 10