6

I am reading a file line by line and want to split each line on the basis of specific delimiter.I found some options available in String class and StringUtils class.

So my question is which is the better option to use and why?

user3717431
  • 103
  • 2
  • 9
  • you may refer this link:http://tvp-technical.blogspot.in/2012/01/subtle-difference-between-string-split.html – Sanjay Sep 20 '14 at 06:20
  • `String.split` is [strange](https://code.google.com/p/guava-libraries/wiki/StringsExplained#Splitter) and inefficient as it compiles the regex every time. No idea about `StringUtils`, using Guava's Splitter. – maaartinus Sep 20 '14 at 07:33
  • @maaartinus this inefficiency is no longer the case since JDK 7 where the native split method is optimized. A regex is not used if the string is 1 character long or 2 characters with the first char being a backslash, the source code can be checked for further details – Deepak Dec 11 '17 at 11:13

3 Answers3

9

It depends on the use case.

What's the difference ?

String[] split(String regEx)

String[] results = StringUtils.split(String str,String separatorChars)

  1. Apache utils split() is null safe. StringUtils.split(null) will return null. The JDK default is not null safe:

    try{ String testString = null; String[] result = testString.split("-"); System.out.println(result.length); } catch(Exception e) { System.out.println(e); // results NPE }

  2. The default String#split() uses a regular expression for splitting the string.
    The Apache version StringUtils#split() uses whitespace/char/String characters/null [depends on split() method signature].
    Since complex regular expressions are very expensive when using extensively, the default String.split() would be a bad idea. Otherwise it's better.

  3. When used for tokenizing a string like following string.split() returns an additional empty string. while Apache version gave the correct results

     String testString = "$Hello$Dear$";

     String[] result = testString.split("\\$");
     System.out.println("Length is "+ result.length); //3
     int i=1;
     for(String str : result) {
        System.out.println("Str"+(i++)+" "+str);
     }

Output

Length is 3
Str1 
Str2 Hello
Str3 Dear

String[] result = StringUtils.split(testString,"$");
System.out.println("Length is "+ result.length); // 2
int i=1;
for(String str : result) {
    System.out.println("Str"+(i++)+" "+str);
}

Output

Length is 2
Str1 Hello
Str2 Dear
Sagar
  • 5,315
  • 6
  • 37
  • 66
RahulArackal
  • 944
  • 12
  • 28
1

Well, it really depends on what you want to achieve. Reading the docs for the split method on String and StringUtils, they're quite different from each other. And based on your requirements

...want to split each line on the basis of specific delimiter.

It seems what you need is the split method in String

  • public String[] split(String regex) - Splits this string around matches of the given regular expression. (src)

ex:

String str = "abc def";
str.split(" ");

returns:

["abc", "def"]

Because the one in the StringUtils is:

  • public static String[] split(String str) - Splits the provided text into an array, using whitespace as the separator. (src)

ex:

StringUtils.split("abc def")

returns:

["abc", "def"]

It's an overloaded method though, so you can use the one that takes another argument for the delimiter

  • public static String[] split(String str, char separatorChar) - Splits the provided text into an array, separator specified. This is an alternative to using StringTokenizer.
lxcky
  • 1,668
  • 2
  • 13
  • 26
  • `StringUtils.split(final String str, final String separatorChars)`. Comparing `StringUtils.split(final String str)` with `String.split(String regex)` doesn't make any sense. – Tom Sep 20 '14 at 06:29
  • @Tom, what are you talking about? – lxcky Sep 20 '14 at 06:34
  • Where do you see the point in comparing two methods where one of them doesn't meet OPs requirements? Wouldn't it make much more sense in comparing `String.split()` with a version of `StringUtils.split()` that can do the same due to the same accessable information/objects? – Tom Sep 20 '14 at 07:03
1

It is worth noting that StringUtils.split documentation states: . Adjacent separators are treated as one separator e.g. StringUtils.split("parm1,parm2,,parm4", ",") gives ["parm1", "parm2", "parm4"] If you want ["parm1", "parm2","" ,"parm4"] you need StringUtils.splitPreserveAllTokens