38

Following is my REPL output. I am not sure why string.split does not work here.

val s = "Pedro|groceries|apple|1.42"
s: java.lang.String = Pedro|groceries|apple|1.42

scala> s.split("|")
res27: Array[java.lang.String] = Array("", P, e, d, r, o, |, g, r, o, c, e, r, i, e, s, |, a, p, p, l, e, |, 1, ., 4, 2)
hrishikeshp19
  • 8,838
  • 26
  • 78
  • 141

3 Answers3

88

If you use quotes, you're asking for a regular expression split. | is the "or" character, so your regex matches nothing or nothing. So everything is split.

If you use split('|') or split("""\|""") you should get what you want.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • 1
    good one. In scala, what is the difference between '' and ""? – hrishikeshp19 Jul 02 '12 at 22:40
  • 3
    `"""stuff"""` quotes a literal string. `"stuff"` interprets escape characters. Since backslash is an escape character _both_ in Java strings _and_ regexes, you would need to escape the escape character to get it into the regex: `"\\|"`. This gets confusing _very_ quickly, so it's better to use triple quotes and go for a literal string. – Rex Kerr Jul 02 '12 at 22:43
  • and... is 'somestring' same as """somestring"""? – hrishikeshp19 Jul 02 '12 at 22:49
  • 5
    `'c'` is a single character. `"""This is a (literal) string, not a single character."""` – Rex Kerr Jul 02 '12 at 23:12
  • 1
    Ah...I got it. I am from JavaScript, and it look like I have forgotten few basic things. – hrishikeshp19 Jul 02 '12 at 23:26
8

| is a special regular expression character which is used as a logical operator for OR operations.

Since java.lang.String#split(String regex); takes in a regular expression, you're splitting the string with "none OR none", which is a whole another speciality about regular expression splitting, where none essentially means "between every single character".

To get what you want, you need to escape your regex pattern properly. To escape the pattern, you need to prepend the character with \ and since \ is a special String character (think \t and \r for example), you need to actually double escape so that you'll end up with s.split("\\|").

For full Java regular expression syntax, see java.util.regex.Pattern javadoc.

Esko
  • 29,022
  • 11
  • 55
  • 82
6

Split takes a regex as first argument, so your call is interpreted as "empty string or empty string". To get the expected behavior you need to escape the pipe character "\\|".

Blaisorblade
  • 6,438
  • 1
  • 43
  • 76
Jari
  • 2,152
  • 13
  • 20