0

I'm trying to split an input by ".,:;()[]"'\/!? " chars and add the words to a list. I've tried .split("\\W+?") and .split("\\W"), but both of them are returning empty elements in the list.

Additionally, I've tried .split("\\W+"), which returns only words without any special characters that should go along with them (for instance, if one of the input words is "C#", it writes "C" in the list). Lastly, I've also tried to put all of the special chars above into the .split() method: .split("\\.,:;\\(\\)\\[]\"'\\\\/!\\? "), but this isn't splitting the input at all. Could anyone advise please?

  • 1
    You need a character class `[]` and you need to escape special characters (which are different for a character class than for a regex in general), so: `split("[.,:;()\\[\\]\"'\\\\/!? ]+")` – Andreas Apr 13 '17 at 09:53

1 Answers1

2

split() function accepts a regex.

This is not the regex you're looking for .split("\\.,:;\\(\\)\\[]\"'\\\\/!\\? ")

Try creating a character class like [.,:;()\[\]'\\\/!\?\s"] and add + to match one or more occurences.

I also suggest to change the character space with the generic \s who takes all the space variations like \t.

If you're sure about the list of characters you have selected as splitters, this should be your correct split with the correct Java string literal as @Andreas suggested:

.split("[.,:;()\\[\\]'\\\\\\/!\\?\\s\"]+")

BTW: I've found a particularly useful eclipse editor option which escapes the string when you're pasting them into the quotes. Go to Window/Preferences, under Java/Editor/Typing/, check the box next to Escape text when pasting into a string literal

Community
  • 1
  • 1
freedev
  • 25,946
  • 8
  • 108
  • 125
  • 1
    No need to escape `.`, `(`, `)`, and `?` inside a character class. – Andreas Apr 13 '17 at 09:54
  • 1
    Why do you have `"` and `)` twice? And you should show how to do it as a Java string literal, since the escaping gets worse there. – Andreas Apr 13 '17 at 09:55