12

I have been given some poorly formatted data and need to pull numbers out of strings. I'm not sure what the best way to do this is. The numbers can be any length.

string a = "557222]]>";
string b = "5100870<br>";

any idea what I can do so I'll get this:

a = "557222"
b = "5100870"

Thanks

Solution is for c# sorry. Edited the question to have that tag

kevp
  • 377
  • 2
  • 7
  • 23

8 Answers8

33

You could write a simple method to extract out all non-digit characters, though this won't handle floating point data:

public string ExtractNumber(string original)
{
     return new string(original.Where(c => Char.IsDigit(c)).ToArray());
}

This purely pulls out the "digits" - you could also use Char.IsNumber instead of Char.IsDigit, depending on the result you wish.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • IsDigit pulls out the following; fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits. You may be giving somebody the impression by stating "purely pulls out the digits". See my answer for obtaining just the characters 0-9. – Atters Nov 16 '15 at 04:19
  • Can be even shorter as `public static string ExtractNumber(string original) => new string(original.Where(char.IsDigit).ToArray());` :) – Patrick Mar 12 '19 at 09:49
13

Try this oneliner:

Regex.Replace(str, "[^0-9 _]", "");
fhcimolin
  • 616
  • 1
  • 8
  • 27
Milind Raut
  • 131
  • 1
  • 2
8

Not familiar enough with .NET for exact code. Nonetheless, two approaches would be:

  • Cast it as an integer. If the non-digit characters are at the end (i.e. 21389abc), this is the easiest.
  • If you have intermixed non-digit characters (i.e. 1231a23v) and want to keep every digit, use the regex [^\d] to replace non-digit characters.
Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
  • 3
    +1 for the regex. With C# you could use something like, `Regex.split(str, "[^\d]")` and use the `string.join("", Regex.split(...))` function. – Ryan Jun 12 '12 at 18:29
  • @Ryan, thanks. Never got too deep in .NET. But logic transcends language :) – Jason McCreary Jun 12 '12 at 18:30
  • 2
    Instead of the split/join mess, a simpler solution is: new Regex(@"\D").Replace(source ?? "", ""); – K Kimble Mar 18 '14 at 17:32
7

You can use a simple regular expression:

var numericPart = Regex.Match( a, "\\d+" ).Value;

If you need it to be an actual numeric value, you can then use int.Parse or int.TryParse.

Ethan Brown
  • 26,892
  • 4
  • 80
  • 92
6

You could use LINQ. The code below filters the string into an IEnumerable with only digits and then converts it to a char[]. The string constructor can then convert the char[] into a string:

string a = "557222]]>";
string b = "5100870<br>";

a = new string(a.Where(x => char.IsDigit(x)).ToArray());
b = new string(b.Where(x => char.IsDigit(x)).ToArray());
soroxis
  • 159
  • 2
  • 6
4

Try this

string number = Regex.Match("12345<br>", @"\d+").Value;

This will return the first group of digits. Example: for the input "a 123 b 456 c" it will return "123".

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
4

The question doesn't explicitly state that you just want the characters 0 to 9 but it wouldn't be a stretch to believe that is true from your example set and comments. So here is the code that does that.

        string digitsOnly = String.Empty;
        foreach (char c in s)
        {
            // Do not use IsDigit as it will include more than the characters 0 through to 9
            if (c >= '0' && c <= '9') digitsOnly += c;
        }

Why you don't want to use Char.IsDigit() - Numbers include characters such as fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits.

Atters
  • 801
  • 8
  • 19
0

Here's the version that worked for my case

    public static string ExtractNumbers(this string source)
    {
        if (String.IsNullOrWhiteSpace(source))
            return string.Empty;
        var number = Regex.Match(source, @"\d+");
        if (number != null)
            return number.Value;
        else
            return string.Empty;
    }
Michael Bahig
  • 748
  • 8
  • 17