2

Say I have some String object containing "This sentence was written on 2020-03-21 by person 1234567 at 07:23 hours". How would I extract ONLY the "1234567" part of the string? Maybe using a solution from this Extract digits from string - StringUtils Java question, but I don't know how to limit the extracted numbers only on the desired sequence.

If I would use the str.replaceAll("[^0-9]", "") on this string, I would get "2020032112345670723" which means that it extracts ALL of the numbers in a string, but I want ONLY the sequence containing a certain number of digits (in my case 7).

Also, the sequence will not always be in the same place, so using substring(index from, index to) will not work.

mdenci
  • 297
  • 2
  • 6
  • 17
  • A bit complicated, admittedly, but you could split the `String` by whitespace(s) and check which one is pure numeric (the date won't be because of the hyphons). – deHaar May 26 '20 at 12:06
  • Good idea, but not for my case because instead of "person" there could be a number in a name, eg. "person 12" – mdenci May 26 '20 at 12:10
  • @mdenci using regular expressions you don't have to care about the position in the string, you should definitely go with a Regex. – emmics May 26 '20 at 12:11
  • 1
    OK, you could check if that number has exactly 7 digits then, too... But in situations with a second number of 7 digits, even a regular expression would fail, too. – deHaar May 26 '20 at 12:13
  • @deHaar no, it will simply return multiple matches of the expression, which is fine. – emmics May 26 '20 at 12:33
  • @mmika1000 Will that really be fine? OP would have to check which number is the desired one. – deHaar May 26 '20 at 12:42
  • @deHaar well, maybe not for his particular task, but if this is a critical task where inconsistencies like that may occur, he should think of a complete different approach either way (--> not parsing this value from a String) – emmics May 26 '20 at 12:44
  • 1
    @mmika1000 yes, totally agree – deHaar May 26 '20 at 12:49

3 Answers3

5

I would probably do that using a regular expression. For seven adjacenct digits that would be \d{7} or even better \b\d{7}\b (thanks @AlexRudenko).

To do so you might wanna use the Pattern API:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

// ...

Pattern digitPattern = Pattern.compile("\\b\\d{7}\\b");
Matcher m = digitPattern.matcher(<your-string-here>);
while (m.find()) {
    String s = m.group();
    // prints just your 7 digits
    System.out.println(s);
}

I just verified it and it's working fine.

(Pattern extraction taken from this answer

Community
  • 1
  • 1
emmics
  • 994
  • 1
  • 11
  • 29
  • I'm getting an "Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )" error on Pattern.compile("\d{7}") – mdenci May 26 '20 at 12:17
  • 2
    Just one comment - you may need to surround digits with `\b` to match a number containing _exactly_ 7 digits: `"\\b\\d{7}\\b"` – Nowhere Man May 26 '20 at 12:18
  • 1
    Yes, you need to escape the backslash in the Pattern to `"\\d{7}"`. I'm sorry, I thought I edited that already. – emmics May 26 '20 at 12:18
2

Assuming that the number of digits is not always 7, I would use the regular expression

" ([0-9]+) "

The inner part [0-9]+ find one or more digits. The spaces left and right of it ensure that the number is only found if surrounded by spaces, so the dates and times in your input string are ignored. The parentheses are used in combination with group(1) to return only the number without spaces around it.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main
{

    private static final Pattern regexp=Pattern.compile(" ([0-9]+) ");

    public static void main(String[] args)
    {
        String s="This sentence was written on 2020-03-21 by person 1234567 at 07:23 hours";
        Matcher matcher=regexp.matcher(s);
        if (matcher.find())
        {
            String number=matcher.group(1);
            System.out.printf("number=%s",number);
        }
    }
}

To find only numbers with 5 - 8 digits, you could write " ([0-9]{5,8}) "

As other wrote in the meantime, \\d may be used as an alternative to [0-9].

Stefan
  • 1,789
  • 1
  • 11
  • 16
0

You can do a simple linear search to find the numeric substring of length 7:

public static void main(String[] args) {
        String str = "This sentence was written on 2020-03-21 by person 1234567 at 07:23 hours";
        System.out.println(getNumber(str));
}
private static String getNumber(String str) {
        String number = null;
        if(str != null)
            for(String s : str.split(" "))
                if(s.length() == 7 && isNumeric(s))
                    number = s;
        return number;
}
private static boolean isNumeric(String str) { 
        try {  
              Integer.parseInt(str);  
              return true;
        } catch(NumberFormatException e){  
              return false;  
        }  
}

Output:

1234567
Majed Badawi
  • 27,616
  • 4
  • 25
  • 48