3

So In my program I can receive strings of all kinds of lengths and send them on their way to get translated. If those strings are of a certain character length I receive an error, so I want to check & split those strings if necessary before that. BUT I can't just split the string in the middle of a word, the words themself also need to be intact & taken into account.

So for example:

let str = "this is an input example of one sentence that contains a bit of words and must be split"
let splitStringArr = [];

// If string is larger than X (for testing make it 20) characters
if(str.length > 20) {
    // Split string sentence into smaller strings, keep words intact
    //...
    // example of result would be
    // splitStringArr = ['this is an input', 'example of one sentence' 'that contains...', '...']
    // instead of ['this is an input exa' 'mple of one senten' 'ce that contains...']
}

But I'm not sure how to split a sentence and still keep into account the sentence length.

Would a solution for this be to iterate over the string, add every word to it and check every time if it is over the maximum length, otherwise start a new array index, or are there better/existing methods for this?

Sven0567
  • 315
  • 3
  • 16
  • A quick question why first splitted string is `this is an input` not `this is an input example` ? if you wanted only upto word less then `maxLength` then why `'example of one sentence` is having string length greater than 20 ? – Code Maniac Oct 02 '19 at 15:31

4 Answers4

5

You can use match and lookahead and word boundaries, |.+ to take care string at the end which are less then max length at the end

let str = "this is an input example of one sentence that contains a bit of words and must be split"

console.log(str.match(/\b[\w\s]{20,}?(?=\s)|.+$/g))
Code Maniac
  • 37,143
  • 5
  • 39
  • 60
  • If you don't want the strings which are less then maxLength only, then you can simply use `/\b[\w\s]{20,}?(?=\s)/g` – Code Maniac Oct 02 '19 at 15:18
  • That's amazing! Is there a way to update this expression to return without the leading space? So in your example it would return something like [ "this is an input example", "of one sentence that", "contains a bit of words", "and must be split" ] – dasis May 03 '22 at 08:38
  • @dasis, [here you go](https://stackoverflow.com/questions/74061458) – OfirD Oct 15 '22 at 21:56
  • What if it's a paragraph, and each sentence contains punctuation marks? For example, `this is an input, example of one sentence. that contains! a bit of words and; must be split.` – Awolad Hossain Dec 06 '22 at 05:52
5

Here's an example using reduce.

const str = "this is an input example of one sentence that contains a bit of words and must be split";

// Split up the string and use `reduce`
// to iterate over it
const temp = str.split(' ').reduce((acc, c) => {

  // Get the number of nested arrays
  const currIndex = acc.length - 1;

  // Join up the last array and get its length
  const currLen = acc[currIndex].join(' ').length;

  // If the length of that content and the new word
  // in the iteration exceeds 20 chars push the new
  // word to a new array
  if (currLen + c.length > 20) {
    acc.push([c]);

  // otherwise add it to the existing array
  } else {
    acc[currIndex].push(c);
  }

  return acc;

}, [[]]);

// Join up all the nested arrays
const out = temp.map(arr => arr.join(' '));

console.log(out);
Andy
  • 61,948
  • 13
  • 68
  • 95
2

What you are looking for is lastIndexOf

In this example, maxOkayStringLength is the max length the string can be before causing an error.

myString.lastIndexOf(/\s/,maxOkayStringLength);

-- edit --

lastIndexOf doesn't take a regex argument, but there's another post on SO that has code to do this:

Is there a version of JavaScript's String.indexOf() that allows for regular expressions?

ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
2

I would suggest:

1) split string by space symbol, so we get array of words

2) starting to create string again selecting words one by one...

3) if next word makes the string exceed the maximum length we start a new string with this word

Something like this:

const splitString = (str, lineLength) => {
  const arr = ['']

  str.split(' ').forEach(word => {
    if (arr[arr.length - 1].length + word.length > lineLength) arr.push('')
    arr[arr.length - 1] += (word + ' ')
  })

  return arr.map(v => v.trim())
}
const str = "this is an input example of one sentence that contains a bit of words and must be split"
console.log(splitString(str, 20))
Dmitry Reutov
  • 2,995
  • 1
  • 5
  • 20