0

I want to split one big string into smaller parts, so given for example:

"A B C D E F G H I J K L"

I want to get array (String []): [A,B,C,D], [E,F,G,H], [I,J,K,L]

Is there any regex for that or I need to do that manually so first to split every space and then concat every N words. ??

piotrassz
  • 33
  • 4

3 Answers3

2

You could use regex for this:

e.g.:

    String x = "AAS BASD CAFAS DAFASF EASFASF FAFSASF GA HASF IAS JAS KAS LSA";
    ArrayList<String> found = new ArrayList<>();
    Pattern pattern = Pattern.compile("(\\w+\\s\\w+\\s\\w+)");
    Matcher m = pattern.matcher(x);
    while (m.find()) {
        String s = m.group();
        found.add(s);
    }

    //if you want to convert your List to an Array
    String[] result = found.toArray(new String[0]);
    System.out.println(Arrays.toString(result));

Result: [AAS BASD CAFAS DAFASF, EASFASF FAFSASF GA HASF, IAS JAS KAS LSA]

This pattern ("(\\w+\\s\\w+\\s\\w+\\s\\w+)") matches 4 words separated by one space. The loop iterates over every found match and adds it to your result list.

warch
  • 2,387
  • 2
  • 26
  • 43
2

You can create a regex that describes this pattern. e.g. "((?:\w+\s*){4})"

enter image description here

Or in simple words:

  • The \w+\s* part means that there are 1 or multiple word-characters (e.g. text, digits) followed by 0, 1 or multiple whitespace characters.

  • It is surrounded in braces and followed by {4} to indicate that we want this to occur 4 times.

  • Finally that again is wrapped in braces, because we want to capture that result.

  • By contrast the braces which were used to specify {4} are preceded by a (?: ...) prefix, which makes it a "non-capturing-group". We don't want to capture the individual matches just yet.

You can use that pattern in java to extract each chunk of 4 occurrences.

enter image description here

And than next, you can simply split each individual result with a second regex, \s+ ( = whitespace)

Edit

One more thing, you may notice that the first matched group also contains whitespace at the end. You can get rid of that with a more advanced regex: ((?:\w+\s+){3}(?:\w+))\s*

enter image description here

bvdb
  • 22,839
  • 10
  • 110
  • 123
  • 1
    your regex looks cleaner than mine but your regex does not match the last group. but i don't know why. – warch Jan 25 '22 at 10:32
  • @warch thank you for pointing that out. Should be fixed now. It's the `\s+` which matches 1 or more spaces, while `\s*` matches 0,1 or more spaces. Because the last match may not have a space at the end, the 2nd is needed. – bvdb Jan 25 '22 at 10:50
  • Can you link the website where you got the explanation? – Cyber Avater Apr 03 '22 at 18:23
  • 1
    @CyberAvater I used https://regex101.com/ , then took some screenshots and added some text to them. – bvdb Apr 04 '22 at 21:29
-1

There are multiple ways you can achieve this,

for ex. let your string be

String str = "A B C D E F G H I J K L";

one way to split it would be using regular expression

 java.util.Arrays.toString(str.split("(?<=\\G....)"))

here the .... represent how many characters in each string, another way to specify the pattern would be .{4}

another way would be

Iterable<String> strArr = Splitter.fixedLength(3).split(str );

there could be more ways to achieve the same

ThisaruG
  • 3,222
  • 7
  • 38
  • 60
GSM
  • 87
  • 11
  • 1
    What is `Splitter`? – Lino Jan 25 '22 at 09:42
  • it's a google library check here https://guava.dev/releases/21.0/api/docs/com/google/common/base/Splitter.html – GSM Jan 25 '22 at 09:43
  • It won't work. I want to split every word not every character. So your answer is wrong. – piotrassz Jan 25 '22 at 09:52
  • this would only work for the example given (1 character words) and only 3 groups my solution works for words of arbitrary lenght and arbitrary number of words – warch Jan 25 '22 at 09:59
  • 1
    You're mostly correct. Just change your `....` with the check of words: `java.util.Arrays.toString(str.split("(?<=\\G\\s?\\w{1,10}\\s\\w{1,10}\\s\\w{1,10}\\s\\w{1,10})"))` Note that `\\w{1,10}` is mandatory rather than `\\w+` because of how Java is implemented (the latter will produce an error). You can use `\\w{1,100}`, if you want. Just don't use `\\w+`. – Olivier Grégoire Jan 25 '22 at 10:16