2

I am trying to split a string in clojure "Hello|World" but when use the split method "(clojure.string/split x #"|")" I get a weird result, I get this "[h e l l o | w o r l d]". Can anyone tell me why it does this and how can I split it up to get [hello world]?

guidot
  • 5,095
  • 2
  • 25
  • 37
Emma
  • 53
  • 4
  • Use `(clojure.string/split "Hello|World" (re-pattern (. java.util.regex.Pattern quote "|")))` which 1) invokes `Pattern.quote` to create a Pattern from the string "|", then 2) uses `re-pattern` to create a regular expression from the quoted string, which is then passed as the second argument to `clojure.string/split`, which then produces the desired result `["Hello" "World"]`. If you want to make this a bit prettier use `(defn re-quoted-pattern [s] (re-pattern (. java.util.regex.Pattern quote s)))`, and your code then becomes `(clojure.string/split "Hello|World" (re-quoted-pattern "|"))`. – Bob Jarvis - Слава Україні May 25 '18 at 15:24
  • @WiktorStribiżew - if you could please remove your close vote on this I could post the comment above as an answer. You may be correct that from a Java point of view the question is a dup, but this question is not tagged for [tag:java], and from a Clojure point of view no one has addressed the issue of how to invoke `Pattern.quote` from Clojure; thus, I believe your close-as-duplicate should be undone. Thanks. – Bob Jarvis - Слава Україні May 25 '18 at 15:36
  • This is a question tagged with `regex`. The `|` symbol is a well-known char that requires escaping if one wants to treat is as a literal char. No need to reopen. – Wiktor Stribiżew May 25 '18 at 16:37

1 Answers1

8

Here is the answer:

(str/split "Hello|World" #"|")  => ["H" "e" "l" "l" "o" "|" "W" "o" "r" "l" "d"]
(str/split "Hello World" #" ")  => ["Hello" "World"]
(str/split "Hello|World" #"\|") => ["Hello" "World"]

In a regular expression, the | character is special, and needs to be escaped with a backslash \.

The | character is a logical operator in regex and is normally used to mean "or", like "abc|def":

(str/split "Hello|World" #"e|o") => ["H" "ll" "|W" "rld"]

Since you had nothing else present it seems to have been interpreted as "anything OR anything", so it matched the boundary between each character.

See the Java docs for more information.

Alan Thompson
  • 29,276
  • 6
  • 41
  • 48