0

Suppose I had the string "1 AND 2 AND 3 OR 4", and want to create an array of strings that contains all substrings "AND" or "OR", in order, found within the string.

So the above string would return a string array of {"AND", "AND", "OR"}.

What would be a smart way of writing that?

EDIT: Using C# 2.0+,

string rule = "1 AND 2 AND 3 OR 4";
string pattern = "(AND|OR)";
string[] conditions = Regex.Split(rule, pattern);

gives me {"1", "AND", "2", "AND", "3", "OR", "4"}, which isn't quite what I'm after. How can I reduce that to the ANDs and ORs only?

David Hodgson
  • 10,104
  • 17
  • 56
  • 77
  • I can see what you're trying to do but i dont think the `Split` approach is most appropriate for what you want. See the Split is separating the input at the ANDs and ORs thus resulting in the numbers (and only the AND/ORs coz of the parenthesis) - which is not what you want. You want the ANDs and ORs. I think a crafted regex pattern could return multiple matches thus capturing only the AND and ORs. – Matt Kocaj May 11 '09 at 07:55
  • Could you explain the purpose if this requirement? It might assist in designing a more appropriate regex. – Matt Kocaj May 11 '09 at 07:56

5 Answers5

1

Your probably looking for a tokeniser or Lexer, have a look at the following article:

C# Regular Expression Recipes—A Better Tokenizer

Student for Life
  • 1,023
  • 1
  • 9
  • 18
1

This regex (.NET) seems to do what you want. You're looking for the matches (multiple) in the group at index=1:

.*?((AND)|(OR))*.*?

EDIT I've tested the following and it seems to do what you want. It's more lines than i would like but it approaches the task in a purely regex fashion (which IMHO is what you should be doing):

        string text = "1 AND 2 AND 3 OR 4";
        string pattern = @"AND|OR";

        Regex r = new Regex(pattern, RegexOptions.IgnoreCase);

        Match m = r.Match(text);
        ArrayList results = new ArrayList();
        while (m.Success)
        {
            results.Add(m.Groups[0].Value);

            m = m.NextMatch();
        }

        string[] matchesStringArray = (string[])results.ToArray(typeof(string));
Matt Kocaj
  • 11,278
  • 6
  • 51
  • 79
  • *shrugs* Maybe I've over complicated it. – Matt Kocaj May 11 '09 at 05:27
  • In C# 2.0+, using "AND|OR" as the pattern gives me more than just the ANDs and ORs - how can I get limit the pattern to give me only the ANDs and ORs? I've edited the question above. – David Hodgson May 11 '09 at 06:13
  • It seems the only way to get the regex engine to move onto the next match (of "AND|OR") is to invoke the .NextMatch() method. This sux coz now u have to iterate. But it seems you were never going to escape using a loop of some kind. Hope this is ok. – Matt Kocaj May 11 '09 at 08:32
  • You may use Regex.Matches to get all the results in one call... but as you said, you'll have to iterate on the result collection... or use Linq to get what you want ! – Cédric Rup May 11 '09 at 09:00
  • Its kool you said that, coz i was thinking of using LINQ to filter out the parts of the dirty collection too. I just think that in this case you should make the best of one technology (if you will) rather than using half of two. In this case, if the regex can do it then i think it should. That being said, if you can use the regex/linq/string_functions as a combination to get the same result but in less (cleaner) lines of code, then +10 - do it that way. ;) – Matt Kocaj May 11 '09 at 10:18
1

Since you know the exact substring you're looking for... why not just use IndexOf(substr, iOffset) to know the number of occurances (loop till it returns -1) ??

Depending on the complexity of your task, it could be simpler/faster than using regular expressions (since you're not matching patterns).

Gishu
  • 134,492
  • 47
  • 225
  • 308
1
string rule = "1 AND 2 AND 3 OR 4";
string pattern = "(AND|OR)";
MatchCollection conditions = Regex.Matches(rule, pattern);

Use Match.Value to get the string.

Nick Whaley
  • 2,729
  • 2
  • 21
  • 28
0

Here's a goofy way that I came up with:

string rule = "1 AND 2 AND 3 OR 4";
List<string> andsOrs = new List<string>();
string[] split = rule.Split();
for (int i = 0; i < split.Length; i++)
{
   if (split[i] == "AND" || split[i] == "OR")
   {
       andsOrs.Add(split[i]);
   }
}
string[] conditions = andsOrs.ToArray();
return conditions;
David Hodgson
  • 10,104
  • 17
  • 56
  • 77