0

I try to split this sentence: use will effect the Project's reputation, your right to copy and

to the words:use will effect the Project's reputation your right to copy and

I'm using line.split("[\\p{Punct}\\s\\p{Digit}]+") but it's splitting the word Project's to Project s.

I try to subtract ' like this : line.split("[\\p{Punct}\\s\\p{Digit}]+&&[^']") but it is not working.

Also, I need to split words that start/end with ' character. Is there a way to do it?

Thanks.

Asaf
  • 107
  • 1
  • 12
  • The `&&[^']` needs to be *inside* the character class: `"[\\p{Punct}\\s\\p{Digit}&&[^']]+"` – Andreas Jun 22 '17 at 20:35
  • Thanks. it's working but I want to avoid splitting only words with `'` not in the first/last character. It fixed the problem of words like: `Project's ` but words that start with `'` are still a problem. There is any solution using only split? – Asaf Jun 22 '17 at 20:44
  • `"(?:[\\p{Punct}\\s\\p{Digit}&&[^']]|(?<!\\p{Alpha})'|'(?!\\p{Alpha}))+"` – Andreas Jun 22 '17 at 21:54
  • Thanks, it works! Can you explain me your regex? This : `[\\p{Punct}\\s\\p{Digit}&&[^']]` I understand, but use of Special constructs it's not clear. Thanks. – Asaf Jun 23 '17 at 17:23
  • [Reference - What does this regex mean?](https://stackoverflow.com/q/22937618/5221149): `|` means OR. `(?<!\\p{Alpha})'` means a `'` not preceded by a letter. `'(?!\\p‌​{Alpha})` means `'` not followed by a letter. `(?:xxx)+` means one or more of `xxx`. Or see [regex101.com](https://regex101.com/r/ie8CLq/1). – Andreas Jun 23 '17 at 18:30
  • Thanks!! I see that the split method keeps this word `""` if I do this: `String check = null; check = ""; for (String word : check.split("(?:[\\p{Punct}\\s\\p{Digit}&&[^']]|(?<!\\p{Alpha})'|'(?!\\p{Alpha}))+")) { if(word.equals("")) System.out.println("null");` It's prints `null` What I need to add to the regex to avoid the `""` ? Thanks? – Asaf Jun 23 '17 at 20:39
  • It's how [`split()`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-int-) works: *If the expression does not match any part of the input then the resulting array has just one element, namely this string.* An empty string doesn't contain any special characters, so result is `{ "" }`. Also note that if input is `"!x!"`, result is `{ "", "x" }`, but input `"!"` gives `{}`. In short, it's up to you to ignore the first value in the result if it is empty. – Andreas Jun 23 '17 at 21:57

0 Answers0