I would like to know how to split up a large string into a series of smaller strings or words. For example:
I want to walk my dog.
I want to have a string: "I"
,
another string:"want"
, etc.
How would I do this?
I would like to know how to split up a large string into a series of smaller strings or words. For example:
I want to walk my dog.
I want to have a string: "I"
,
another string:"want"
, etc.
How would I do this?
Use split()
method
Eg:
String s = "I want to walk my dog";
String[] arr = s.split(" ");
for ( String ss : arr) {
System.out.println(ss);
}
As a more general solution (but ASCII only!), to include any other separators between words (like commas and semicolons), I suggest:
String s = "I want to walk my dog, cat, and tarantula; maybe even my tortoise.";
String[] words = s.split("\\W+");
The regex means that the delimiters will be anything that is not a word [\W], in groups of at least one [+]. Because [+] is greedy, it will take for instance ';' and ' ' together as one delimiter.
A regex can also be used to split words.
\w
can be used to match word characters ([A-Za-z0-9_]
), so that punctuation is removed from the results:
String s = "I want to walk my dog, and why not?";
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
Outputs:
I
want
to
walk
my
dog
and
why
not
See Java API documentation for Pattern
See my other answer if your phrase contains accentuated characters :
String[] listeMots = phrase.split("\\P{L}+");
Yet another method, using StringTokenizer :
String s = "I want to walk my dog";
StringTokenizer tokenizer = new StringTokenizer(s);
while(tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
To include any separators between words (like everything except all lower case and upper case letters), we can do:
String mystring = "hi, there,hi Leo";
String[] arr = mystring.split("[^a-zA-Z]+");
for(int i = 0; i < arr.length; i += 1)
{
System.out.println(arr[i]);
}
Here the regex means that the separators will be anything that is not a upper or lower case letter [^a-zA-Z]
, in groups of at least one [+].
This regex will split word by space like space, tab, line break:
String[] str = s.split("\\s+");
you can use Apache commons' StringUtils class
String[] partsOfString = StringUtils.split("I want to walk my dog", StringUtils.SPACE)
StringTokenizer separate = new StringTokenizer(s, " ");
String word = separate.nextToken();
System.out.println(word);
Java String split() method example
public class SplitExample{
public static void main(String args[]){
String str="java string split method";
String[] words=str.split("\\s");//splits the string based on whitespace
for(String word:words){
System.out.println(word);
}
}
}
class test{
public static void main(String[] args){
StringTokenizer st= new StringTokenizer("I want to walk my dog.");
while (st.hasMoreTokens())
System.out.println(st.nextToken());
}
}
Using Java Stream API:
String sentence = "I want to walk my dog.";
Arrays.stream(sentence.split(" ")).forEach(System.out::println);
Output:
I
want
to
walk
my
dog.
Or
String sentence2 = "I want to walk my dog.";
Arrays.stream(sentence2.split(" ")).map(str -> str.replace(".", "")).forEach(System.out::println);
Output:
I
want
to
walk
my
dog