117

I have a String variable (basically an English sentence with an unspecified number of numbers) and I'd like to extract all the numbers into an array of integers. I was wondering whether there was a quick solution with regular expressions?


I used Sean's solution and changed it slightly:

LinkedList<String> numbers = new LinkedList<String>();

Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(line); 
while (m.find()) {
   numbers.add(m.group());
}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
John Manak
  • 1,205
  • 2
  • 9
  • 6
  • 1
    Are numbers surrounded by spaces or other characters? How are numbers formatted, are they hexadecimal, octal, binary, decimal? – Buhake Sindi Mar 02 '10 at 22:38
  • I thought it was clear from the question: it's an English sentence with numbers. Moreover I was talking about an integer array, so what I was looking for were integers. – John Manak Mar 02 '10 at 22:56

13 Answers13

186
Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher("There are more than -2 and less than 12 numbers here");
while (m.find()) {
  System.out.println(m.group());
}

... prints -2 and 12.


-? matches a leading negative sign -- optionally. \d matches a digit, and we need to write \ as \\ in a Java String though. So, \d+ matches 1 or more digits.

AkkyR
  • 19
  • 8
Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • 4
    Could you complement your answer by explaining your regular expression please? – OscarRyz Mar 02 '10 at 22:42
  • 3
    -? matches a leading negative sign -- optionally. \d matches a digit, and we need to write \ as \\ in a Java String though. So, \\d+ matches 1 more more digits – Sean Owen Mar 02 '10 at 23:41
  • 8
    I changed my expression to Pattern.compile("-?[\\d\\.]+") to support floats. You definitely lead me on the way, Thx! – jlengrand Jun 13 '12 at 08:31
  • This method detects digits but does not detect formated numbers, e.g. `2,000`. For such use `-?\\d+,?\\d+|-?\\d+` – Mugoma J. Okomba Mar 09 '16 at 12:25
  • That only supports a single comma, so would miss "2,000,000". It also accepts strings like "2,00". If comma separators must be supported, then: `-?\\d+(,\\d{3})*` should work. – Sean Owen Mar 09 '16 at 20:41
55

What about to use replaceAll java.lang.String method:

    String str = "qwerty-1qwerty-2 455 f0gfg 4";      
    str = str.replaceAll("[^-?0-9]+", " "); 
    System.out.println(Arrays.asList(str.trim().split(" ")));

Output:

[-1, -2, 455, 0, 4]

Description

[^-?0-9]+
  • [ and ] delimites a set of characters to be single matched, i.e., only one time in any order
  • ^ Special identifier used in the beginning of the set, used to indicate to match all characters not present in the delimited set, instead of all characters present in the set.
  • + Between one and unlimited times, as many times as possible, giving back as needed
  • -? One of the characters “-” and “?”
  • 0-9 A character in the range between “0” and “9”
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144
Maxim Shoustin
  • 77,483
  • 27
  • 203
  • 225
19
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher(myString);
while (m.find()) {
    int n = Integer.parseInt(m.group());
    // append n to list
}
// convert list to array, etc

You can actually replace [0-9] with \d, but that involves double backslash escaping, which makes it harder to read.

sidereal
  • 1,072
  • 7
  • 15
10
  StringBuffer sBuffer = new StringBuffer();
  Pattern p = Pattern.compile("[0-9]+.[0-9]*|[0-9]*.[0-9]+|[0-9]+");
  Matcher m = p.matcher(str);
  while (m.find()) {
    sBuffer.append(m.group());
  }
  return sBuffer.toString();

This is for extracting numbers retaining the decimal

Kannan
  • 109
  • 1
  • 2
8

The accepted answer detects digits but does not detect formated numbers, e.g. 2,000, nor decimals, e.g. 4.8. For such use -?\\d+(,\\d+)*?\\.?\\d+?:

Pattern p = Pattern.compile("-?\\d+(,\\d+)*?\\.?\\d+?");
List<String> numbers = new ArrayList<String>();
Matcher m = p.matcher("Government has distributed 4.8 million textbooks to 2,000 schools");
while (m.find()) {  
    numbers.add(m.group());
}   
System.out.println(numbers);

Output: [4.8, 2,000]

Andrii Abramov
  • 10,019
  • 9
  • 74
  • 96
Mugoma J. Okomba
  • 3,185
  • 1
  • 26
  • 37
  • 2
    @JulienS.: I disagree. This regex does much more than the OP asked for, and it does incorrectly. (At the least, the decimal portion should be in an optional group, with everything in it required and greedy: `(?:\.\d+)?`.) – Alan Moore May 18 '16 at 00:45
  • You certainly have a point there for the decimal portion. However it is very common to encounter formatted numbers. – Julien May 20 '16 at 06:58
  • @AlanMoore many visitors to SO are looking for any/different ways to resolve issues with varying similarity/difference, and it is helpful that suggestion are brought up. Even the OP might have oversimplified. – Mugoma J. Okomba Jul 15 '16 at 00:43
5

Using Java 8, you can do:

String str = "There 0 are 1 some -2-34 -numbers 567 here 890 .";
int[] ints = Arrays.stream(str.replaceAll("-", " -").split("[^-\\d]+"))
                 .filter(s -> !s.matches("-?"))
                 .mapToInt(Integer::parseInt).toArray();
System.out.println(Arrays.toString(ints)); // prints [0, 1, -2, -34, 567, 890]

If you don't have negative numbers, you can get rid of the replaceAll (and use !s.isEmpty() in filter), as that's only to properly split something like 2-34 (this can also be handled purely with regex in split, but it's fairly complicated).

Arrays.stream turns our String[] into a Stream<String>.

filter gets rid of the leading and trailing empty strings as well as any - that isn't part of a number.

mapToInt(Integer::parseInt).toArray() calls parseInt on each String to give us an int[].


Alternatively, Java 9 has a Matcher.results method, which should allow for something like:

Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher("There 0 are 1 some -2-34 -numbers 567 here 890 .");
int[] ints = m.results().map(MatchResults::group).mapToInt(Integer::parseInt).toArray();
System.out.println(Arrays.toString(ints)); // prints [0, 1, -2, -34, 567, 890]

As it stands, neither of these is a big improvement over just looping over the results with Pattern / Matcher as shown in the other answers, but it should be simpler if you want to follow this up with more complex operations which are significantly simplified with the use of streams.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
4

for rational numbers use this one: (([0-9]+.[0-9]*)|([0-9]*.[0-9]+)|([0-9]+))

Andrey
  • 59,039
  • 12
  • 119
  • 163
  • 1
    The OP said integers, not real numbers. Also, you forgot to escape the dots, and none of those parentheses are necessary. – Alan Moore Mar 02 '10 at 23:01
1

I would suggest to check the ASCII values to extract numbers from a String Suppose you have an input String as myname12345 and if you want to just extract the numbers 12345 you can do so by first converting the String to Character Array then use the following pseudocode

    for(int i=0; i < CharacterArray.length; i++)
    {
        if( a[i] >=48 && a[i] <= 58)
            System.out.print(a[i]);
    }

once the numbers are extracted append them to an array

Hope this helps

hestellezg
  • 3,309
  • 3
  • 33
  • 37
  • A Java string is counted sequence of Unicode/UTF-16 code-units. By the design of UTF-16 the first 128 characters have the same value (by not the same size) as their ASCII encoding; Beyond that, thinking you are dealing with ASCII will lead to errors. – Tom Blodget May 26 '14 at 21:24
1

Extract all real numbers using this.

public static ArrayList<Double> extractNumbersInOrder(String str){

    str+='a';
    double[] returnArray = new double[]{};

    ArrayList<Double> list = new ArrayList<Double>();
    String singleNum="";
    Boolean numStarted;
    for(char c:str.toCharArray()){

        if(isNumber(c)){
            singleNum+=c;

        } else {
            if(!singleNum.equals("")){  //number ended
                list.add(Double.valueOf(singleNum));
                System.out.println(singleNum);
                singleNum="";
            }
        }
    }

    return list;
}


public static boolean isNumber(char c){
    if(Character.isDigit(c)||c=='-'||c=='+'||c=='.'){
        return true;
    } else {
        return false;
    }
}
1

Fraction and grouping characters for representing real numbers may differ between languages. The same real number could be written in very different ways depending on the language.

The number two million in German

2,000,000.00

and in English

2.000.000,00

A method to fully extract real numbers from a given string in a language agnostic way:

public List<BigDecimal> extractDecimals(final String s, final char fraction, final char grouping) {
    List<BigDecimal> decimals = new ArrayList<BigDecimal>();
    //Remove grouping character for easier regexp extraction
    StringBuilder noGrouping = new StringBuilder();
    int i = 0;
    while(i >= 0 && i < s.length()) {
        char c = s.charAt(i);
        if(c == grouping) {
            int prev = i-1, next = i+1;
            boolean isValidGroupingChar =
                    prev >= 0 && Character.isDigit(s.charAt(prev)) &&
                    next < s.length() && Character.isDigit(s.charAt(next));                 
            if(!isValidGroupingChar)
                noGrouping.append(c);
            i++;
        } else {
            noGrouping.append(c);
            i++;
        }
    }
    //the '.' character has to be escaped in regular expressions
    String fractionRegex = fraction == POINT ? "\\." : String.valueOf(fraction);
    Pattern p = Pattern.compile("-?(\\d+" + fractionRegex + "\\d+|\\d+)");
    Matcher m = p.matcher(noGrouping);
    while (m.find()) {
        String match = m.group().replace(COMMA, POINT);
        decimals.add(new BigDecimal(match));
    }
    return decimals;
}
AnDus
  • 89
  • 1
  • 4
1

If you want to exclude numbers that are contained within words, such as bar1 or aa1bb, then add word boundaries \b to any of the regex based answers. For example:

Pattern p = Pattern.compile("\\b-?\\d+\\b");
Matcher m = p.matcher("9There 9are more9 th9an -2 and less than 12 numbers here9");
while (m.find()) {
  System.out.println(m.group());
}

displays:

2
12
dxl
  • 231
  • 2
  • 4
0

I found this expression simplest

String[] extractednums = msg.split("\\\\D++");
Buddy
  • 10,874
  • 5
  • 41
  • 58
0
public static String extractNumberFromString(String number) {
    String num = number.replaceAll("[^0-9]+", " ");
    return num.replaceAll(" ", "");
}

extracts only numbers from string