77

Since String.split() works with regular expressions, this snippet:

String s = "str?str?argh";
s.split("r?");

... yields: [, s, t, , ?, s, t, , ?, a, , g, h]

What's the most elegant way to split this String on the r? sequence so that it produces [st, st, argh]?

EDIT: I know that I can escape the problematic ?. The trouble is I don't know the delimiter offhand and I don't feel like working this around by writing an escapeGenericRegex() function.

Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
  • This is mentioned in the accepted answer of [How to split a string in Java - Stack Overflow](https://stackoverflow.com/q/3481828). Related questions: How to split string by [(space)](https://stackoverflow.com/q/7899525)/[(backslash)](https://stackoverflow.com/q/23751618)/[(newline)](https://stackoverflow.com/q/454908)/[(pipe)](https://stackoverflow.com/q/10796160)? ; [How to escape text for regular expression in Java](https://stackoverflow.com/questions/60160/how-to-escape-text-for-regular-expression-in-java) – user202729 Feb 02 '21 at 02:35

8 Answers8

105

A general solution using just Java SE APIs is:

String separator = ...
s.split(Pattern.quote(separator));

The quote method returns a regex that will match the argument string as a literal.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • This operation create `Matcher` at every call. If you repeat this action, its better prepare one `Matcher m = Pattern.compile(separator, Pattern.LITERAL)` – Grigory Kislin Aug 31 '23 at 21:47
13

You can use

StringUtils.split("?r")

from commons-lang.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674
BastiS
  • 452
  • 3
  • 11
  • 4
    StringUtils.split() should be much faster than String.split() since StringUtils.split is using linear scanning for the separator, whereas String.split() is using regex, which is really slow – Michael P Feb 15 '17 at 19:11
  • 3
    Something to be aware of - according to the JavaDoc this treats adjacent separators as one separator. In my situation this was not desired – Tarmo Jun 29 '18 at 10:56
  • 1
    be aware that this accepts a list of _characters_ to split on, not a string. so this would split the string on instances of `?` or `r`, not instances of `r?` – starwarswii May 13 '21 at 20:14
  • 1
    No reference to the String we are splitting, `s`? – Imran Oct 22 '21 at 18:04
  • 1
    This is missing the first argument, and moreover would not produce the desired output anyway. It splits on every character, not the whole string. Use `StringUtils.splitByWholeSeparator(s, "r?")`. – Alex Wittig Dec 30 '21 at 18:58
6

This works perfect as well:

public static List<String> splitNonRegex(String input, String delim)
{
    List<String> l = new ArrayList<String>();
    int offset = 0;

    while (true)
    {
        int index = input.indexOf(delim, offset);
        if (index == -1)
        {
            l.add(input.substring(offset));
            return l;
        } else
        {
            l.add(input.substring(offset, index));
            offset = (index + delim.length());
        }
    }
}
Community
  • 1
  • 1
Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
  • The performance of this solution is not ideal since it creates temporary substrings. – BladeCoder May 20 '14 at 09:30
  • 1
    @BladeCoder: You're right. I fixed it :) (When I wrote this, I must have been 16, I guess) – Martijn Courteaux May 20 '14 at 10:44
  • Much better indeed :) – BladeCoder May 20 '14 at 21:25
  • I have an app (and tests) where I split frequently, and I do not need a single split on a regular expression. And Android-Studio keeps kvetching about my regular expressions (which I do not need) are not efficiently pre-compiled patterns. I will use this, and not use it in the production code inside a loop. Thanks! – Phlip Feb 13 '21 at 00:22
4

Using directly the Pattern class, is possible to define the expression as LITERAL, and in that case, the expression will be evaluated as is (not regex expression).

Pattern.compile(<literalExpression>, Pattern.LITERAL).split(<stringToBeSplitted>);

example:

String[] result = Pattern.compile("r?", Pattern.LITERAL).split("str?str?argh");

will result:

[st, st, argh]
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Manuel Romeiro
  • 1,002
  • 12
  • 14
  • 3
    Your answer would be best if you explained your code. It will also be more useful to new users who search something similar in the future. – Nic3500 Jul 30 '18 at 11:46
  • I think that `Pattern.quote(...)` is a better solution. Certainly it is fewer characters :-) – Stephen C Sep 13 '18 at 00:03
  • There should be no difference in performance. They will do the same thing under the hood. – Stephen C Sep 17 '18 at 00:00
  • I have to agree with you. In theory, LITERAL should be more performant than evaluate the regex expression, but I done a little test with java 8, and for some inputs, LITERAL was best than QUOTE, but for others was the reverse. Conclusion: for now there is no relevant difference on performance. – Manuel Romeiro Sep 18 '18 at 01:25
4

Escape the ?:

s.split("r\\?");
Etienne de Martel
  • 34,692
  • 8
  • 91
  • 111
4
String[] strs = str.split(Pattern.quote("r?"));
贼小气
  • 49
  • 2
3

Use Guava Splitter:

Extracts non-overlapping substrings from an input string, typically by recognizing appearances of a separator sequence. This separator can be specified as a single character, fixed string, regular expression or CharMatcher instance. Or, instead of using a separator at all, a splitter can extract adjacent substrings of a given fixed length.

Taylor
  • 3,942
  • 2
  • 20
  • 33
mindas
  • 26,463
  • 15
  • 97
  • 154
2

org.apache.commons.lang.StringUtils has methods for splitting Strings without expensive regular expressions.

Be sure to read the javadocs closely as the behavior can be subtle. StringUtils.split (as in another answer) does not meet the stated requirements. Use StringUtils.splitByWholeSeparator instead:

String s = "str?str?argh";

StringUtils.split(s, "r?");                   //[st, st, a, gh]
StringUtils.splitByWholeSeparator(s, "r?");   //[st, st, argh]
Alex Wittig
  • 2,800
  • 1
  • 33
  • 42