587

What regex pattern would need I to pass to java.lang.String.split() to split a String into an Array of substrings using all whitespace characters (' ', '\t', '\n', etc.) as delimiters?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
mcjabberz
  • 9,788
  • 10
  • 36
  • 38

13 Answers13

997

Something in the lines of

myString.split("\\s+");

This groups all white spaces as a delimiter.

So if I have the string:

"Hello[space character][tab character]World"

This should yield the strings "Hello" and "World" and omit the empty space between the [space] and the [tab].

As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal "\s", which means, you need to pass "\\s". It can get a bit confusing.

The \\s is equivalent to [ \\t\\n\\x0B\\f\\r].

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
Henrik Paul
  • 66,919
  • 31
  • 85
  • 96
91

In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\w - Matches any word character.

\W - Matches any nonword character.

\s - Matches any white-space character.

\S - Matches anything but white-space characters.

\d - Matches any digit.

\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.

Amit Joki
  • 58,320
  • 7
  • 77
  • 95
glenatron
  • 11,018
  • 13
  • 64
  • 112
69

To get this working in Javascript, I had to do the following:

myString.split(/\s+/g)
Andy Thomas
  • 84,978
  • 11
  • 107
  • 151
Mike Manard
  • 1,020
  • 9
  • 13
  • 16
    This is in Javascript. I wasn't paying attention either :) – miracle2k May 10 '12 at 20:52
  • 16
    Oops. My mistake. Maybe this answer will still help some others that stumble upon this thread while looking for a Javascript answer. :-) – Mike Manard Sep 07 '12 at 19:00
  • Haha I was looking for an answer for JavaScript, accidently came across this question and then noticed your answer before I left. +1. – Kris Aug 01 '14 at 22:00
  • That's great! I'm glad to hear this answer proved useful for somebody, even if it did answer the wrong question. :-) – Mike Manard Oct 08 '14 at 14:28
  • This helped me so much as well, needed to split server args :) – amyiris Feb 29 '20 at 21:40
37

"\\s+" should do the trick

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 1
    Why the + at the end? – Floella Jan 22 '16 at 21:50
  • 4
    @Anarelle it repeats the space character capture at least once, and as many time as possible: see [https://regex101.com/r/dT7wG9/1](https://regex101.com/r/dT7wG9/1) or [http://rick.measham.id.au/paste/explain.pl?regex=\s%2B](http://rick.measham.id.au/paste/explain.pl?regex=\s%2B) or [http://regexper.com/#^s%2B](http://regexper.com/#^s%2B) or [http://www.myezapp.com/apps/dev/regexp/show.ws?regex=\s+&env=env_java](http://www.myezapp.com/apps/dev/regexp/show.ws?regex=\s+&env=env_java) – VonC Jan 23 '16 at 05:59
13

Also you may have a UniCode non-breaking space xA0...

String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking
jake_astub
  • 362
  • 4
  • 11
10
String string = "Ram is going to school";
String[] arrayOfString = string.split("\\s+");
Arrow
  • 165
  • 2
  • 12
9

Apache Commons Lang has a method to split a string with whitespace characters as delimiters:

StringUtils.split("abc def")

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split(java.lang.String)

This might be easier to use than a regex pattern.

Felix Scheffer
  • 346
  • 2
  • 5
  • 4
3

All you need is to split using the one of the special character of Java Ragex Engine,

and that is- WhiteSpace Character

  • \d Represents a digit: [0-9]
  • \D Represents a non-digit: [^0-9]
  • \s Represents a whitespace character including [ \t\n\x0B\f\r]
  • \S Represents a non-whitespace character as [^\s]
  • \v Represents a vertical whitespace character as [\n\x0B\f\r\x85\u2028\u2029]
  • \V Represents a non-vertical whitespace character as [^\v]
  • \w Represents a word character as [a-zA-Z_0-9]
  • \W Represents a non-word character as [^\w]

Here, the key point to remember is that the small leter character \s represents all types of white spaces including a single space [ ] , tab characters [ ] or anything similar.

So, if you'll try will something like this-

String theString = "Java<a space><a tab>Programming"
String []allParts = theString.split("\\s+");

You will get the desired output.


Some Very Useful Links:


Hope, this might help you the best!!!

SKL
  • 103
  • 6
2

To split a string with any Unicode whitespace, you need to use

s.split("(?U)\\s+")
         ^^^^

The (?U) inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS that enables \s shorthand character class to match any characters from the whitespace Unicode category.

If you want to split with whitespace and keep the whitespaces in the resulting array, use

s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")

See the regex demo. See Java demo:

String s = "Hello\t World\u00A0»";
System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »]
System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")));
// => [Hello,    , World,  , »]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

you can split a string by line break by using the following statement :

 String textStr[] = yourString.split("\\r?\\n");

you can split a string by Whitespace by using the following statement :

String textStr[] = yourString.split("\\s+");
RajeshVijayakumar
  • 10,281
  • 11
  • 57
  • 84
1
String str = "Hello   World";
String res[] = str.split("\\s+");
Skywalker
  • 1,590
  • 1
  • 18
  • 36
Olivia Liao
  • 375
  • 3
  • 7
1

Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:

myString.split(/[\s\W]+/)
Ria
  • 10,237
  • 3
  • 33
  • 60
Rishabh
  • 11
  • 1
-1

Study this code.. good luck

    import java.util.*;
class Demo{
    public static void main(String args[]){
        Scanner input = new Scanner(System.in);
        System.out.print("Input String : ");
        String s1 = input.nextLine();   
        String[] tokens = s1.split("[\\s\\xA0]+");      
        System.out.println(tokens.length);      
        for(String s : tokens){
            System.out.println(s);

        } 
    }
}