0

I need to split this String into tokens using split. I am having trouble understanding java.util.regex.

String str = "He is a very very good boy, isn't he?";

The output in an array should be:

[He, is, a, very, very, good, boy, isn, t, he]

How do I achieve that?

  • String [] array = str.split(" "); – Dren Mar 02 '21 at 09:52
  • @Dren: That wouldn't give the desired output. (It would include the punctuation, and wouldn't split "isn't" into two tokens. – Jon Skeet Mar 02 '21 at 09:52
  • 1
    Try `String[] array = str.split("[\\p{Punct}\\s]+");` – nehacharya Mar 02 '21 at 09:55
  • I used str.split("\s*(=>|,|'|\s)\s*"). I tried this as I saw it somewhere online. This did the job to a few extent. But it didn't remove '?' part. Could you explain the regex I just mentioned? @JonSkeet – Aakash Thakur Mar 02 '21 at 09:56
  • 1
    `str.split("\\W+")` – Ivar Mar 02 '21 at 09:57
  • @Ivar Works like magic! Could you explain how that works? And also, I checked online and found another regex that helped: "[^a-zA-Z]+" is "\\W+" and "[^a-zA-Z]+" the same? – Aakash Thakur Mar 02 '21 at 10:03
  • 1
    @AakashThakur They are not entirely the same. `[^a-zA-Z]+` means "match everything except A-Z (both upper- and lowercase) one or more in a row". `\W+` would be equal to `[^a-zA-Z0-9_]` which means "Everything except A-Z (both upper- and lowercase), numbers 0-9 or underscores one or more in a row". (Note that regex is `\W`, but because you need to escape the backslash in string literals in Java, the string literal is `"\\W"`.) You can use a tool [like this one](https://regexper.com/) to visualize the regular expression. – Ivar Mar 02 '21 at 10:12
  • @Ivar The tool you mentioned is very interesting. This will definitely help me understand regex better! Thank You so much. – Aakash Thakur Mar 02 '21 at 10:41
  • @AakashThakur: Stack Overflow comment threads are not meant to be for this sort of thing. If you want to ask how to understand a particular regex, you should ask that *as a question* (not as a comment) and go into how much of it you *do* understand, and which bits are confusing to you. – Jon Skeet Mar 02 '21 at 10:54

0 Answers0