12

I am attempting to create a tool that takes an input text and splits it into chunks of text at a certain # of characters. However, I need to make sure it does not split the text in the middle of a word.

In my case, I am splitting the string after 155 characters.

I've done quite a lot of searching to try and find a solution, but I fear it may be more complicated than my knowledge of Javascript allows me to figure out. I believe I just need to make some sort of logic that has the splitter backtrack to a space to split if it is in the middle of a word, but I am not sure how to write out such a thing.

Here is my javascript code at the moment:

function splitText() {
    "use strict";
    var str = document.getElementById("user_input").value;
    var size = 195;
    var chunks = new Array(Math.ceil(str.length / size)),
        nChunks = chunks.length;

    var newo = 0;
    for (var i = 0, o = 0; i < nChunks; ++i, o = newo) {
          newo += size;
          chunks[i] = str.substr(o, size);
    }

    var newq = 0;
    for (var x = 0, q = 0; x < nChunks; ++x, q = newq) {
        $("#display").append("<textarea readonly>" + chunks[x] + "</textarea><br/>");
    }
}

And here is my HTML:

<body>
    <content>
        <h1>Text Splitter</h1>
        <form>
            <label>Enter a Message</label>
            <input type="text" name="message" id="user_input">
        </form>
        <form>
            <input type="button" onclick="splitText();" id="submit" value="Submit!"> <br/>
        </form>
        <label>Your split message: </label>
        <p><span id='display'></span></p>
    </content>
</body>

Here is the code in its current working form, if you'd like to take a look: https://jsfiddle.net/add4s7rs/7/

Thank you! I appreciate any assistance!

Leminnes
  • 133
  • 1
  • 1
  • 5

5 Answers5

31

A short and simple way to split a string into chunks up to a certain length using a regexp:

const chunks = str.match(/.{1,154}(?:\s|$)/g);

some examples:

const str = 'the quick brown fox jumps over the lazy dog';

console.log(str.match(/.{1,10}(?:\s|$)/g))
console.log(str.match(/.{1,15}(?:\s|$)/g))

This works because quantifiers (in this case {1,154}) are by default greedy and will attempt to match as many characters as they can. putting the (\s|$) behind the .{1,154} forces the match to terminate on a whitespace character or the end of the string. So .{1,154}(\s|$) will match up to 154 characters followed by a whitespace character. The /g modifier then makes it continue to match through the entire string.

To put this in the context of your function:

function splitText() {
    "use strict";
    var str = document.getElementById("user_input").value;
    var chunks = str.match(/.{1,154}(?:\s|$)/g);

    chunks.forEach(function (i,x) {
        $("#display").append("<textarea readonly>" + chunks[x] + "</textarea><br/>");
    });
}

Note (as has been pointed out in the comments) that this code will fail if one of the words in the string is longer than the length of the chunks.

Note also that this code will leave a trailing space on the end of the split strings; if you want to avoid that change the last group to a lookahead and insist that the first character not be whitespace:

const str = 'the quick brown fox jumps over the lazy dog';

console.log(str.match(/\S.{1,9}(?=\s|$)/g))
console.log(str.match(/\S.{1,14}(?=\s|$)/g))
Nick
  • 138,499
  • 22
  • 57
  • 95
  • This worked quite well except that for the last string it cuts off the last word. Is there a way to fix that? – Leminnes Apr 14 '18 at 23:54
  • Oops! Fixed the regexp now. Sorry about that - I should have noticed in my examples. – Nick Apr 14 '18 at 23:56
  • Ah, thanks much! One more related but unrelated question. If I wanted to add a character, say a + to the end of every line but the very last one, would that be possible? – Leminnes Apr 15 '18 at 01:13
  • You could change `chunks[x]` to `chunks[x].replace(/(\s)$/, '$1+')` or (if you want to get rid of the trailing whitespace on the first strings) `chunks[x].replace(/\s+$/, '+')` – Nick Apr 15 '18 at 01:55
  • That worked perfectly. I should really learn more regex. It seems so powerful, just so immensely confusing haha – Leminnes Apr 15 '18 at 03:28
  • Yes and Yes! There's a good site to play with them at [regex101.com](https://regex101.com) – Nick Apr 15 '18 at 03:50
  • Final question for you, since I've been trying to figure it out without resorting to asking, but so far I have failed. Similar to above, how would I add a character, like + again, to the beginning of the strings in all but the *first* string? – Leminnes Apr 15 '18 at 05:48
  • Replace `chunks[x]` with `(x == 0 ? chunks[x] : '+' + chunks[x])`. – Nick Apr 15 '18 at 05:57
  • Note that in my solution (and in all these comments), you should use `i` instead of `chunks[x]` (since that is what `i` is set to in each iteration of `forEach()`). – Nick Apr 15 '18 at 05:58
  • 2
    note: this will fail for words longer than (154) characters `var str = 'the quick brown fox somersaulted over the lazy dog'; console.log(str.match(/.{1,10}(\s|$)/g))` `(5) ["the quick ", "brown fox ", "mersaulted ", "over the ", "lazy dog"]` – Cody Moniz Sep 09 '19 at 21:49
  • 1
    @CodyMoniz you are right - but I don't know of any words that are that long. The longest word in an English dictionary is only 45 letters. – Nick Sep 09 '19 at 21:54
1

you could use a simple function like this:

function split(string) {
  for(i=154; i>=0; i--) {
    if(string.charAt(i) == " ") {
      var newString1 = string.slice(0, i);
      var newString2 = string.slice(i);
    }
  }
}

Instead of assigning to separate strings you can always put them into an array if you'd like as well.

Cory Kleiser
  • 1,969
  • 2
  • 13
  • 26
  • You don't want to create a loop that goes 154 times through the string just to count to 155. You know that you want to manipulate the string after the 154th character; just do it! ;) – Adriano Apr 14 '18 at 23:01
  • 1
    The loop is counting down from 154 until it gets to a space. – Cory Kleiser Apr 14 '18 at 23:07
0

A more simple approach would be to split the entered text into an array of the individual words and then loop through the array and re-build the string, keeping a count of whether adding the next word in the array will put you over your max size.

Also, note that you should keep all of your form elements inside a single form.

Lastly, you should not use inline HTML event attributes (onclick, etc.). That was a technique we used 20+ years ago before we had standards and best-practices and, unfortunately the use of the technique is so prolific, it just will not die the abrupt death it deserves. There are many reasons not to code this way and instead use the modern approach of doing all event handling with .addEventListener() in a separate JavaScript.

// Don't set variables to properties of DOM elements, set them to the element
// itself so that if you ever want a different property value, you don't have to
// re-scan the DOM for the same element again.
var str = document.getElementById("user_input");
var output = document.getElementById("display");

document.getElementById("go").addEventListener("click",function(){
    "use strict";
    const size = 155; // Constant is more appropriate here
    var words = str.value.split(/\s+/g); // Create array of all words in field
    
    var finalString = "";
    
    // Loop through array
    for(var i = 0; i < words.length; i++){
      if((finalString + " " + words[i]).length <= size){
        finalString += " " + words[i];
      } else {
        break; // Max size exceeded - quit loop!
      }
    }
    
    // Update field with correct value
    output.textContent = finalString;
    console.log(finalString.length);
});
textarea {
  width:500px;
  height:100px;
}
<h1>Text Splitter</h1>
        <form>
            <label>Enter a Message
            <textarea name="message" id="user_input">This is a test of the emergency broadcast system. If this had been an actual emergency, you would have been informed as to what instructions you should follow in order to remain safe at all times.</textarea></label>
            <input type="button" id="go" value="Submit!"><br>
        </form>
        <label>Your split message: </label>
        <p><span id='display'></span></p>
Scott Marcus
  • 64,069
  • 6
  • 49
  • 71
  • I think you misinterpreted the question. He needs to split the text into chunks, not just cut it and get the first chunk :-) _(sorry if I misspelled your name in my answer)_ – blex Apr 14 '18 at 22:29
  • @blex I don't know about that. I asked him a couple of clarifying questions in his original question and this is what I believe he's asking. We'll see I guess. – Scott Marcus Apr 14 '18 at 22:33
  • Try his JS Fiddle with a really long text, and you'll see multiple `textarea`s as a result, while your answer only outputs one `textContent` – blex Apr 14 '18 at 22:33
  • @blex I did. It seems to be doing what my code does (except his doesn't get the count right). – Scott Marcus Apr 14 '18 at 22:34
  • @blex I had a couple of UI corrections to make to better emulate what the OP's fiddle does if that's what you mean, but the algorithm hasn't changed. – Scott Marcus Apr 14 '18 at 22:38
  • For whatever reason, when I try to use this on my local version, it does not work. I click the submit button and nothing happens. I'll keep messing around and see if I can get it to work. – Leminnes Apr 14 '18 at 23:07
  • I figured it out, I'm just an idiot. The others are right, I want to be able to display every string of that length. – Leminnes Apr 14 '18 at 23:26
0

This solution goes on the logic of having a maximum chunk size and then reducing that size if need be to fit your word. Uses a while loop and a little bit of C style logic.

function splitText() {
    "use strict";

    var str = document.getElementById("user_input").value;

    // Maximum allowed chunk size
    let MAX_CHUNK_SIZE = 155;
    let chunks = new Array();
    let current_chunk_position = 0;

    while(current_chunk_position < str.length){

        let current_substring = str.substr(current_chunk_position, MAX_CHUNK_SIZE);

        let last_index = current_substring.lastIndexOf(" ") > 0 ? current_substring.lastIndexOf(" ") : MAX_CHUNK_SIZE;

        let chunk = str.substr(current_chunk_position, last_index);
        chunks.push(chunk);

        current_chunk_position += last_index;
    }

    var newq = 0;
    for (var x = 0, q = 0; x < nChunks; ++x, q = newq) {
        $("#display").append("<textarea readonly>" + chunks[x] + "</textarea><br/>");
    }
}
0

With Regex and Pure Javascript. This will take array of strings and return array containing chunks of strings based on the given length.

const chunkString = (str, length) => {
  return str.match(new RegExp(".{1," + length + "}", "g"));
};

const chunkStringArray = (arr) => {
  return arr.map((el) => chunkString(el, 200)).flat();
};
Shubham Ambastha
  • 194
  • 3
  • 11