2

I want to split strings similar to abc123, abcdefgh12 or a123456 into letters and numbers, so that the result will be {"abc", "123"} etc.

What is the simplest way to do it in C# 4.0? I want to do it with one regex.

Ilya Kogan
  • 21,995
  • 15
  • 85
  • 141

5 Answers5

1

Why regex?

    static readonly char[] digits = {'0','1','2','3','4','5','6','7','8','9'};
    ....
    string s = "abcdefgh12", x = s, y = "";
    int i = s.IndexOfAny(digits);
    if (i >= 0) {
        x = s.Substring(0, i);
        y = s.Substring(i, s.Length - i);
    }
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • @Ilya Kogan: If you don't know the solution, then how can you determine whether it is simple? – Sebastian Mach Jan 11 '11 at 09:34
  • @phresnel It's not that I don't know the solution, I'm just looking for something concise, something that will make the code look short and clear. – Ilya Kogan Jan 11 '11 at 09:39
  • 1
    @Ilya - if you want concise, then refactor it away into a separate method and call that when desired. The implementation presented is *simple*, *correct* and *efficient*. I'll take that over "1 line" any day of the week. – Marc Gravell Jan 11 '11 at 09:41
  • Hi @Marc, I guess we just see different things as simple. I agree that regexes can be easily overused, and @phresnel's link shows one really horrible example. But in this case I find @Josh's answer (linked by @Mehdi) more readable. It saves the need of calculating indexes and offsets inside the string. – Ilya Kogan Jan 11 '11 at 19:12
  • @Ilya - I look at it this way: if I saw **just** that regex code, how long would it take me to grok what it is doing? Now compare to the index code "it is finding the first occurrence of a digit, and splitting there" – Marc Gravell Jan 11 '11 at 19:27
  • @Marc I think it's a personal thing and it depends what kind of code you're used to reading. For example, when I saw your code (this is just to explain what I mean as part of an intellectual discussion, I hope you take no offence) I thought: "Oh god, what are all these variables doing here? And why does he need to calculate the length and then subtract some index? I don't care about the length and I don't need all these integers in my code that have no practical meaning." – Ilya Kogan Jan 11 '11 at 19:40
  • Yup; damn those pesky variables, always causing trouble. – Marc Gravell Jan 12 '11 at 06:13
1

"Only numbers or only letters" can be represented using [a-zA-Z]*|[0-9]*. All you have to do is look for all matches of that regular expression in your string. Note that non-alphanumeric characters will not be returned, but will still split the strings (so "123-456" would yield { "123", "456"}).

EDIT: I've interpreted your question as stating that your strings can be a sequence of letters and numbers in any order - if your string is merely one or more letters followed by one or more numbers, a regular expression is unnecessary: look for the first digit and split the string.

Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89
  • Splitting is done by calling `Regex.Matches` (http://msdn.microsoft.com/en-us/library/e7sf90t3.aspx) and then reading through the returned `MatchCollection` – Victor Nicollet Jan 11 '11 at 09:33
1

In addition to Marc Gravell, read http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html .

What is the simplest way to do it in C# 4.0? I want to do it with one regex.

That's practically an oxymoron in your case. The simplest way of splitting by a fixed pattern is not with regexes.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
1

Unless I'm missing something, this should do the trick... ([a-z]*)([0-9]*)

CAFxX
  • 28,060
  • 6
  • 41
  • 66
  • either [a-zA-Z]* or you pass the case insensitive comparison option (don't know how it works in C#, in php and js you append /i at the end of the regex). Keep also in mind that the solution will also match ABC and 123 (i.e. only letters or only numbers). To avoid that replace the *s with +s. – CAFxX Jan 11 '11 at 09:35
  • Would the regex not be better with one-or-more-matching? – Sebastian Mach Jan 11 '11 at 09:41
  • That's what I wrote in the comment above: replace the *s with +s. – CAFxX Jan 11 '11 at 09:53
-1

You could create a group for letteres and one for numbers. use this guide for further info: http://www.regular-expressions.info/reference.html HTH!

Daniel
  • 1