1

I'm trying to split a sentence with .!? like it was done in this question, but also account for possible double quotes at the beginning and end of a sentence. I'm using this:

let str = '" Non. Es-tu sûr ? "';
let result = str.match(/[^\.!\?]+[\.!\?]+/g);

console.log(result)

But when I do it, the 2 characters after the ? are not caught. So instead of getting:

['" Non.', 'Es-tu sûr ? "']

I'm getting:

['" Non.', 'Es-tu sûr ?']

Is there anyway to split these sentences using regex?

Artur Carvalho
  • 6,901
  • 10
  • 76
  • 105
  • What is the requirement here? If you mean to match any non-word chars, try `/[^.!?]+[.!?]+\W*/g`. It may grab a bit too much if the next sentence first letter is not an ASCII letter though. If you plan to match punctuation with spaces try `/[^.!?]+[\s.!?!-#%-*,-\/\\:;?@[-\]_{}]+/g`. You may add all Unicode punctuation here, if need be. – Wiktor Stribiżew Nov 09 '18 at 11:30
  • *I'm trying to split a sentence* - on the basis of which character? Is it period (`.`) character? – vrintle Nov 09 '18 at 11:40
  • Thanks @WiktorStribiżew. I don't control the input, so I'm still testing the possible sentence splitters. For now I'm just considering the typical ones: .!? and the double quotes. – Artur Carvalho Nov 09 '18 at 12:53
  • Then use `/[^.!?]+[.!?]+[\s"]*/g` – Wiktor Stribiżew Nov 09 '18 at 12:55
  • See https://stackoverflow.com/a/53226132/3832970 – Wiktor Stribiżew Nov 09 '18 at 12:57

2 Answers2

1

Looks like all you need to do is optionally match "s at the beginning and end:

let str = '" Non. Es-tu sûr ? "';
console.log(
  str.match( /"?[^.!?]+[.!?]+(?: *")?/g )
);
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • This is it, thanks! I was trying to consider the quotes as another sentence finisher. So I was thinking that it should end in '!' OR '.' OR '?'+ OR ' "'. But the thing was that you considered the quotes as an extra to the finisher and not a finisher. – Artur Carvalho Nov 09 '18 at 12:57
1

If you just want to match additional trailing whitespace and " chars after final punctuation you may use

let str = '" Non. Es-tu sûr ? "';
let result = str.match(/[^.!?]+[.!?]+[\s"']*/g);
console.log(result)

See the regex demo. The [\s"']* pattern matches 0 or more whitespace, " or ' chars.

Note you do not need to escape . and ? inside character classes.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563