3

I want to split a String on a delimiter. Example String:

String str="ABCD/12346567899887455422DEFG/15479897445698742322141PQRS/141455798951";

Now I want Strings as ABCD/12346567899887455422, DEFG/15479897445698742322141 like I want

  • only 4 chars before /
  • after / any number of chars numbers and letters. Update: The only time I need the previous 4 characters is after a delimiter is shown, as the string may contain letters or numbers...

My code attempt:

public class StringReq {

    public static void main(String[] args) {
        String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
        testSplitStrings(str);


    }

    public static void testSplitStrings(String path) {
        System.out.println("splitting of sprint starts \n");
        String[] codeDesc = path.split("/");
        String[] codeVal = new String[codeDesc.length];
        for (int i = 0; i < codeDesc.length; i++) {
            codeVal[i] = codeDesc[i].substring(codeDesc[i].length() - 4,
                    codeDesc[i].length());

            System.out.println("line" + i + "==> " + codeDesc[i] + "\n");
        }

        for (int i = 0; i < codeVal.length - 1; i++) {
            System.out.println(codeVal[i]);
        }
        System.out.println("splitting of sprint ends");
    }

}
Cœur
  • 37,241
  • 25
  • 195
  • 267

4 Answers4

10

You claim that after / there can appear digits and alphabets, but in your example I don't see any alphabets which should be included in result after /.

So based on that assumption you can simply split in placed which has digit before and A-Z character after it.

To do so you can split with regex which is using look-around mechanism like str.split("(?<=[0-9])(?=[A-Z])")

Demo:

String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
for (String s : str.split("(?<=[0-9])(?=[A-Z])"))
    System.out.println(s);

Output:

BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890

If you alphabets can actually appear in second part (after /) then you can use split which will try to find places which have four alphabetic characters and / after it like split("(?=[A-Z]{4}/)") (assuming that you are using at least Java 8, if not you will need to manually exclude case of splitting at start of the string for instance by adding (?!^) or (?<=.) at start of your regex).

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • How about input like `TEST/123ABCDATA/123`? Should be valid and result in `TEST/123ABC` and `DATA/123` what I can see from the question. – Roger Gustavsson May 18 '15 at 13:08
  • I was also wondering about case like this one since OP claims that "*after / any number of chars even digits and alphabets*" but it looks like OP is actually using in second part only digits, so I based my answer on assumption that alphabets will not actually belong/appear in second part. Will update my answer to reflect on that. – Pshemo May 18 '15 at 13:12
  • I agree that example data didn't reflect this fact, but that the specification does. The following regex should be correct, `(?<=.)(?=[A-Z]{4}/)` – Roger Gustavsson May 18 '15 at 13:17
  • @RogerGustavsson Yes I already added very similar solution to my answer. BTW since Java 8 we don't need to worry about `(?<=.)` part because `split` on zero-length regexes (like look-around) will not produce empty string at start if delimiter would be found at start of the string. You can find more info about that at my question posted here: https://stackoverflow.com/questions/22718744/why-does-split-in-java-8-sometimes-remove-empty-strings-at-start-of-result-array – Pshemo May 18 '15 at 13:22
3

you can use regex

    Pattern pattern = Pattern.compile("[A-Z]{4}/[0-9]*");
    Matcher matcher = pattern.matcher(str);
    while (matcher.find()) {
      System.out.println(matcher.group());
    }
karci10
  • 375
  • 3
  • 15
2

Instead of:

String[] codeDesc = path.split("/");

Just use this regex (4 characters before / and any characters after):

String[] codeDesc = path.split("(?=.{4}/)(?<=.)");
chris
  • 161
  • 5
1

Even simpler using \d:

path.split("(?=[A-Za-z])(?<=\\d)");

EDIT:

Included condition for 4 any size letters only.

path.split("(?=[A-Za-z]{4})(?<=\\d)");

output:

BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890

It is still unclear if this is authors expected result.

Navidot
  • 327
  • 3
  • 10
  • The number characters can be mixed with alphabetic characters. Valid input sequence: `TEST/123ABCTEST/123CDE`. Should be `TEST/123ABC`and `TEST/123CDE`. So your answer will not work. – Roger Gustavsson May 18 '15 at 13:06
  • Roger Gustavsson ok, you have right with chars number. Author mentioned about that in post but it is still unclear what is expected result and thereby what is expected behaviour. I added condition for 4 letters for this solution plus condition for lower letters - we should't restrict them. You don’t have good reason to down vote my answer. Why do you do that? It is another version of the highest scored response. Another thing is that your solutions are also wrong. You don’t considering lower case characters in your suggestions, so it also should be down voted according to this policy. – Navidot May 18 '15 at 13:53
  • Sorry about the down vote. It was too quick. I couldn't undo it until you edited your answer. – Roger Gustavsson May 18 '15 at 15:38