52

What's the regular expression to check if a string starts with "mailto" or "ftp" or "joe" or...

Now I am using C# and code like this in a big if with many ors:

String.StartsWith("mailto:")
String.StartsWith("ftp")

It looks like a regex would be better for this. Or is there a C# way I am missing here?

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
kacalapy
  • 9,806
  • 20
  • 74
  • 119

6 Answers6

82

You could use:

^(mailto|ftp|joe)

But to be honest, StartsWith is perfectly fine to here. You could rewrite it as follows:

string[] prefixes = { "http", "mailto", "joe" };
string s = "joe:bloggs";
bool result = prefixes.Any(prefix => s.StartsWith(prefix));

You could also look at the System.Uri class if you are parsing URIs.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 2
    I know this is an old post...but it would be interesting to know the performance statistics for both of these...which is faster? – Sarima Jun 19 '14 at 10:09
26

The following will match on any string that starts with mailto, ftp or http:

 RegEx reg = new RegEx("^(mailto|ftp|http)");

To break it down:

  • ^ matches start of line
  • (mailto|ftp|http) matches any of the items separated by a |

I would find StartsWith to be more readable in this case.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
10

The StartsWith method will be faster, as there is no overhead of interpreting a regular expression, but here is how you do it:

if (Regex.IsMatch(theString, "^(mailto|ftp|joe):")) ...

The ^ mathes the start of the string. You can put any protocols between the parentheses separated by | characters.

edit:

Another approach that is much faster, is to get the start of the string and use in a switch. The switch sets up a hash table with the strings, so it's faster than comparing all the strings:

int index = theString.IndexOf(':');
if (index != -1) {
  switch (theString.Substring(0, index)) {
    case "mailto":
    case "ftp":
    case "joe":
      // do something
      break;
  }
}
Community
  • 1
  • 1
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • -1, The more strings you add, the more likely the regex is going to be faster I think, since the regex engine will be able to find a match using a single pass of the source string (or less than a pass, if there is no match). Anyways, I don't think speed is going to be an issue with either method. – tster May 01 '10 at 17:09
  • @tster: You are forgetting the short circuit of a condition. If there is a match the `StartsWith` solution by average only have to check half of the strings before it finds the match, or less if they are arranged according to rate of occurance. – Guffa May 01 '10 at 17:17
4

For the extension method fans:

public static bool RegexStartsWith(this string str, params string[] patterns)
{
    return patterns.Any(pattern => 
       Regex.Match(str, "^("+pattern+")").Success);
}

Usage

var answer = str.RegexStartsWith("mailto","ftp","joe");
//or
var answer2 = str.RegexStartsWith("mailto|ftp|joe");
//or
bool startsWithWhiteSpace = "  does this start with space or tab?".RegexStartsWith(@"\s");
K. R.
  • 1,220
  • 17
  • 20
1

I really recommend using the String.StartsWith method over the Regex.IsMatch if you only plan to check the beginning of a string.

  • Firstly, the regular expression in C# is a language in a language with does not help understanding and code maintenance. Regular expression is a kind of DSL.
  • Secondly, many developers does not understand regular expressions: it is something which is not understandable for many humans.
  • Thirdly, the StartsWith method brings you features to enable culture dependant comparison which regular expressions are not aware of.

In your case you should use regular expressions only if you plan implementing more complex string comparison in the future.

Ucodia
  • 7,410
  • 11
  • 47
  • 81
0

You can get the substring before ':' using array slices and method String::IndexOf which returns -1 if search substring does not exist. Then you can compare gotten result with constant and logical patterns (C# 9.0+) to check that strings really start with these defined.

string s = "ftp:custom";
int index = s.IndexOf(':');
bool result = index > 0 && s[..index] is "mailto" or "ftp" or "joe";
Reverin
  • 49
  • 8