53

I would like to know how to split up a large string into a series of smaller strings or words. For example:

I want to walk my dog.

I want to have a string: "I", another string:"want", etc.

How would I do this?

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
fosho
  • 1,666
  • 6
  • 20
  • 28

15 Answers15

86

Use split() method

Eg:

String s = "I want to walk my dog";
String[] arr = s.split(" ");    

for ( String ss : arr) {
    System.out.println(ss);
}
Abdullah Khan
  • 12,010
  • 6
  • 65
  • 78
Kumar Vivek Mitra
  • 33,294
  • 6
  • 48
  • 75
71

As a more general solution (but ASCII only!), to include any other separators between words (like commas and semicolons), I suggest:

String s = "I want to walk my dog, cat, and tarantula; maybe even my tortoise.";
String[] words = s.split("\\W+");

The regex means that the delimiters will be anything that is not a word [\W], in groups of at least one [+]. Because [+] is greedy, it will take for instance ';' and ' ' together as one delimiter.

Teodor Anton
  • 835
  • 6
  • 8
  • 8
    \\W only seems to consider ASCII alphabetic characters. It isn't suitable for languages with accents. – rghome May 19 '17 at 13:56
31

A regex can also be used to split words.

\w can be used to match word characters ([A-Za-z0-9_]), so that punctuation is removed from the results:

String s = "I want to walk my dog, and why not?";
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
    System.out.println(matcher.group());
}

Outputs:

I
want
to
walk
my
dog
and
why
not

See Java API documentation for Pattern

Abdullah Khan
  • 12,010
  • 6
  • 65
  • 78
Pete
  • 660
  • 6
  • 10
14

See my other answer if your phrase contains accentuated characters :

String[] listeMots = phrase.split("\\P{L}+");
Community
  • 1
  • 1
Pierre C
  • 2,920
  • 1
  • 35
  • 35
6

Yet another method, using StringTokenizer :

String s = "I want to walk my dog";
StringTokenizer tokenizer = new StringTokenizer(s);

while(tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}
Kao
  • 7,225
  • 9
  • 41
  • 65
  • ah! this is good in case where i dont need an array but isn't tokenizer returning an array of token? nice idea though – Coding Enthusiast Jan 20 '17 at 21:42
  • No, there isn't any array being produced . `StringTokenizer` looks for the consecutive tokens in the string and returns them one by one. – Kao Jan 21 '17 at 12:55
  • 1
    Nice solution, unfortunately, StringTokenizer should not be used anymore. From the Docs: StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead. – Tomor Jan 06 '18 at 19:24
4

To include any separators between words (like everything except all lower case and upper case letters), we can do:

String mystring = "hi, there,hi Leo";
String[] arr = mystring.split("[^a-zA-Z]+");
for(int i = 0; i < arr.length; i += 1)
{
     System.out.println(arr[i]);
}

Here the regex means that the separators will be anything that is not a upper or lower case letter [^a-zA-Z], in groups of at least one [+].

Danh
  • 5,916
  • 7
  • 30
  • 45
2

You can use split(" ") method of the String class and can get each word as code given below:

String s = "I want to walk my dog";
String []strArray=s.split(" ");
for(int i=0; i<strArray.length;i++) {
     System.out.println(strArray[i]);
}
rwisch45
  • 3,692
  • 2
  • 25
  • 36
AKT
  • 182
  • 1
  • 8
2

This regex will split word by space like space, tab, line break:

String[] str = s.split("\\s+");
Kyo Huu
  • 520
  • 7
  • 13
1

Use split()

String words[] = stringInstance.split(" ");
jmj
  • 237,923
  • 42
  • 401
  • 438
1

you can use Apache commons' StringUtils class

String[] partsOfString = StringUtils.split("I want to walk my dog", StringUtils.SPACE)
Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Gagan Chouhan
  • 312
  • 1
  • 6
1
StringTokenizer separate = new StringTokenizer(s, " ");
String word = separate.nextToken();
System.out.println(word);
ch Noman
  • 19
  • 4
1

Java String split() method example

 public class SplitExample{  
        public static void main(String args[]){  
            String str="java string split method";  
            String[] words=str.split("\\s");//splits the string based on whitespace  
     
            for(String word:words){  
                System.out.println(word);  
            }  
        }
    }
ngg
  • 1,493
  • 19
  • 14
0
class test{
           
    public static void main(String[] args){
                StringTokenizer st= new StringTokenizer("I want to walk my dog.");
                
                while (st.hasMoreTokens())
                    System.out.println(st.nextToken());
         
            }
        }
Ole Pannier
  • 3,208
  • 9
  • 22
  • 33
0

Using Java Stream API:

String sentence = "I want to walk my dog.";

Arrays.stream(sentence.split(" ")).forEach(System.out::println);

Output:

I
want
to
walk
my
dog.

Or

String sentence2 = "I want to walk my dog.";

Arrays.stream(sentence2.split(" ")).map(str -> str.replace(".", "")).forEach(System.out::println);

Output:

I
want
to
walk
my
dog
Arafath
  • 100
  • 3
  • 9
-1
String[] str = s.split("[^a-zA-Z]+");
Makyen
  • 31,849
  • 12
  • 86
  • 121
  • Pattern matching of your own is usually not the best way to go; use solutions of people who have done that already and thought of all the weird corner cases that you don't think of at the moment of writing. Also, as a rule of thumb, I would rather go with a whitelist of whitespace characters here instead of trying to match the words as you miss out on umlauts etc. – Cherusker Jan 21 '19 at 17:43