0

MESSAGE FOR MOD: This question is about keeping full sentences, the linked questions only consider words.

Take the following example:

// I have a long string that I split into sentences
const input =
  "This is the first sentence... This is the second, much longer sentence, with some additional puntuations?! Third sentence with a different length! Just a sentence ending with a number 980. Last but not least, the fourth sentence.";

// I used the following code to split the long string into sentences:
const arr = input.replace(/([.?!])\s*(?=[a-zA-Z0-9])/g, "$1|").split("|");

// we can assume that a sentence from the input array does not exceed this limit
const maxLength = 105;

// TODO: magic happens

/* I'm trying to get an array with the sentences re-joined
  by a space, split so that one string does not exceed the limit
[
  "This is the first sentence... This is the second, much longer sentence, with some additional puntuations?!",
  "Third sentence with a different length! Just a sentence ending with a number 980.",
  "Last but not least, the fourth sentence."
]*/

codesandbox: https://codesandbox.io/s/string-array-split-max-chunk-length-bfteuh?file=/index.ts

Im trying to get an array that combines the strings of the initial array with a space but considers that the resulting strings cannot exceed a maximum length. Also the strings have to contain full sentences (you can assume a single sentence won't exceed the limit). Also consider "...", "?!", "???", etc. as possibile sentence endings in the original input string.

How would you go about this? Do I have to use recursion to get some kind of concise code? Is recursion the most elegant solution?

Note: So far I tried a reducer but thought that I would have to use the rest of the array in a recursive function.

  • you can post the actual code here instead of sending a link plz – Chris G Sep 27 '22 at 12:05
  • @evolutionxbox thanks for your reply. I cannot use this approach as it does not consider sentences being left unsplit. I updated my question accordingly. – Sander Schnydrig Sep 27 '22 at 12:09
  • Join them back together and then use the other linked question to split them up again? – evolutionxbox Sep 27 '22 at 12:10
  • 1
    `result = arr.join('').match(/.{1,102}(\.|$)/g)` – Nick Sep 27 '22 at 12:11
  • 1
    https://codesandbox.io/s/string-array-split-max-chunk-length-forked-yjt0zd?file=/index.ts – HTMHell Sep 27 '22 at 12:12
  • @evolutionxbox the second linked question only considers words being left unsplit sadly, I need whole sentences in the end. – Sander Schnydrig Sep 27 '22 at 12:14
  • How do you expect something like regex to understand what a sentence is? – evolutionxbox Sep 27 '22 at 12:15
  • @evolutionxbox of course its not that easy, but this regex seems to work well for me for splitting the sentences: /([.?!])\s*(?=[A-Z])/g – Sander Schnydrig Sep 27 '22 at 12:20
  • @Nick your code seems to be working for me, thanks a bunch! Maybe I'm missing something so far but if you want you can post this as a solution: `const resultNew = arr.join(" ").match(/.{1,102}(\.|$)/g)?.map((str) => str.trim());` – Sander Schnydrig Sep 27 '22 at 12:21
  • @SanderSchnydrig that's ok, it is really a dupe, just substituting a `.` instead of a space in the regex. – Nick Sep 27 '22 at 12:23
  • Note that if you want to deal with sentences ending in `?` or `!` you should change the group to (e.g.) `([.?!]|$)` – Nick Sep 27 '22 at 12:24
  • @Nick many thanks! That solved many of my issues. The only caveat left is if somebody is using "(!)" or "(?)" or "..." in the input. I'm not very experienced with regex, do you have an idea how I could handle those cases as well? – Sander Schnydrig Sep 27 '22 at 12:28
  • Then it gets tricky... you could add a lookbehind to assert the character before the `.` is a letter e.g. `((?<=[a-zA-Z])[.!?]|$)` – Nick Sep 27 '22 at 12:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/248379/discussion-between-sander-schnydrig-and-nick). – Sander Schnydrig Sep 27 '22 at 12:34

0 Answers0