0

I have a string composed by a list of digits as shown:

000000900100103010330200000005677890212126000020

using a single regex, I would like to get many results dividing the original string in substrings of different lengths. I mean something like

00000009 001 001 03 01 033 02 00000005677890212126 00002 0

So I will need to have these different "groups" (I hope it is the correct word to use)

  • 00000009
  • 001
  • 001
  • 03
  • 01
  • 033
  • 02
  • 00000005677890212126
  • 00002
  • 0

The length of each element is fixed and will never change. Is it possible?

I have tried:

[0-9]{8}[0-9]{3}[0-9]{3}[0-9]{2}...

but of course it does not work

  • 5
    well, you are not actually *group*ing in the shown regex, try `([0-9]{8})([0-9]{3})([0-9]{3})([0-9]{2})...`. But why would you even want to use a regex for it in the first place? – luk2302 Oct 17 '17 at 15:35
  • Doesn't "00000005677890212126" contain two matches ("0000000567789" and "0212126")? Sounds to me as though you want something along the lines of `(0+\d*?)(?=0|$)` – jaytea Oct 17 '17 at 15:40
  • 1
    https://stackoverflow.com/questions/1609807/whats-the-best-way-of-parsing-a-fixed-width-formatted-file-in-java? –  Oct 17 '17 at 15:44
  • If the length of each element is fixed, it would be better to use String#substring(), rather than a regex. – Reinstate Monica -- notmaynard Oct 17 '17 at 16:27

2 Answers2

4

You need to use a Pattern and if it is found, use Matcher.groupCount and Matcher.group(int i).

static final Pattern p = Pattern.compile(
        "([0-9]{8})"
        +"([0-9]{3})"
        +"([0-9]{3})"
        +"([0-9]{2})"
        +"([0-9]{2})"
        +"([0-9]{3})"
        +"([0-9]{2})"
        +"([0-9]{20})"
        +"([0-9]{5})"
        +"([0-9]{1})");

private void test(String[] args) {
    // NB: I added one more 0 at the start.
    Matcher m = p.matcher("0000000900100103010330200000005677890212126000020");
    if ( m.find() ) {
        for ( int i = 1; i <= m.groupCount(); i++ ) {
            System.out.print(m.group(i)+" ");
        }
    }
}

prints

00000009 001 001 03 01 033 02 00000005677890212126 00002 0

In Java 8 you can build your regex on-the-fly.

static final List<Integer> fieldWidths = Arrays.asList(8,3,3,2,2,3,2,20,5,1);
static final Pattern p = Pattern.compile(
        fieldWidths.stream()
                .map(i -> "(\\d{"+i+"})")
                .collect(Collectors.joining()));
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
2

I like the answer above, here is an alternative way without regex, with good old for loop:

public static List<String> splitString(String inputString, int... lengths) {

    List<String> substrings = new ArrayList<String>();

    int start = 0;
    int end = 0;

    for(int length : lengths) {

        start = end;
        end = start + length;

        String substring  = inputString.substring(start, end);
        substrings.add(substring);
    }

    return substrings;
}

private void test(String[] args) {
    String s = "0000000900100103010330200000005677890212126000020";
    List<String> list = splitString(s,8,3,3,2,2,3,2,20,5,1);
}