14

Here is my question:

Given a string, which is made up of space separated words, how can I split that into N strings of (roughly) even length, only breaking on spaces?

Here is what I've gathered from research:

I started by researching word-wrapping algorithms, because it seems to me that this is basically a word-wrapping problem. However, the majority of what I've found so far (and there is A LOT out there about word wrapping) assumes that the width of the line is a known input, and the number of lines is an output. I want the opposite.

I have found a (very) few questions, such as this that seem to be helpful. However, they are all focused on the problem as one of optimization - e.g. how can I split a sentence into a given number of lines, while minimizing the raggedness of the lines, or the wasted whitespace, or whatever, and do it in linear (or NlogN, or whatever) time. These questions seem mostly to be unanswered, as the optimization part of the problem is relatively "hard".

However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.

Here is what I've come up with: I think I have a workable method for the case when N=3. I start by putting the first word on the first line, the last word on the last line, and then iteratively putting another word on the first and last lines, until my total width (measured by the length of the longest line) stops getting shorter. This usually works, but it gets tripped up if your longest words are in the middle of the line, and it doesn't seem very generalizable to more than 3 lines.

var getLongestHeaderLine = function(headerText) {
  //Utility function definitions
  var getLongest = function(arrayOfArrays) {
    return arrayOfArrays.reduce(function(a, b) {
      return a.length > b.length ? a : b;
    });
  };

  var sumOfLengths = function(arrayOfArrays) {
    return arrayOfArrays.reduce(function(a, b) {
      return a + b.length + 1;
    }, 0);
  };

  var getLongestLine = function(lines) {
    return lines.reduce(function(a, b) {
      return sumOfLengths(a) > sumOfLengths(b) ? a : b;
    });
  };

  var getHeaderLength = function(lines) {
    return sumOfLengths(getLongestLine(lines));
  }

  //first, deal with the degenerate cases
  if (!headerText)
    return headerText;

  headerText = headerText.trim();

  var headerWords = headerText.split(" ");

  if (headerWords.length === 1)
    return headerText;

  if (headerWords.length === 2)
    return getLongest(headerWords);

  //If we have more than 2 words in the header,
  //we need to split them into 3 lines
  var firstLine = headerWords.splice(0, 1);
  var lastLine = headerWords.splice(-1, 1);
  var lines = [firstLine, headerWords, lastLine];

  //The header length is the length of the longest
  //line in the header. We will keep iterating
  //until the header length stops getting shorter.
  var headerLength = getHeaderLength(lines);
  var lastHeaderLength = headerLength;
  while (true) {
    //Take the first word from the middle line,
    //and add it to the first line
    firstLine.push(headerWords.shift());
    headerLength = getHeaderLength(lines);
    if (headerLength > lastHeaderLength || headerWords.length === 0) {
      //If we stopped getting shorter, undo
      headerWords.unshift(firstLine.pop());
      break;
    }
    //Take the last word from the middle line,
    //and add it to the last line
    lastHeaderLength = headerLength;
    lastLine.unshift(headerWords.pop());
    headerLength = getHeaderLength(lines);
    if (headerLength > lastHeaderLength || headerWords.length === 0) {
      //If we stopped getting shorter, undo
      headerWords.push(lastLine.shift());
      break;
    }
    lastHeaderLength = headerLength;
  }

  return getLongestLine(lines).join(" ");
};

debugger;
var header = "an apple a day keeps the doctor away";

var longestHeaderLine = getLongestHeaderLine(header);
debugger;

EDIT: I tagged javascript, because ultimately I would like a solution I can implement in that language. It's not super critical to the problem though, and I would take any solution that works.

EDIT#2: While performance is not what I'm most concerned about here, I do need to be able to perform whatever solution I come up with ~100-200 times, on strings that can be up to ~250 characters long. This would be done during a page load, so it needs to not take forever. For example, I've found that trying to offload this problem to the rendering engine by putting each string into a DIV and playing with the dimensions doesn't work, since it (seems to be) incredibly expensive to measure rendered elements.

Community
  • 1
  • 1
odin243
  • 141
  • 8
  • Could you post the code you have so far? –  Aug 22 '16 at 15:53
  • Not exactly what I'm using currently, but real close: https://jsfiddle.net/a546ova0/5/ – odin243 Aug 22 '16 at 16:00
  • @Mojtaba Not looking to have it done for me, just looking for some new ideas. Several paid programmers have already come up empty, so looking to crowdsource a little innovation here. – odin243 Aug 22 '16 at 16:01
  • @tt.net, I updated your question with the code you provided – Mojtaba Aug 22 '16 at 16:04
  • Confusion numero uno: You say the input is the text, and the desired number of lines - yet your example only shows you passing the text as an input. – Jamiec Aug 22 '16 at 16:11
  • @Jamiec Yeah, the code is what is referenced by this: "I think I have a workable method for the case when N=3". The code has a hard-coded first, last, and middle line, so it doesn't work in the general case. I've yet to figure out any way of generalizing the process of taking from the beginning and end of a "middle" line, if I don't know the actual number of lines. – odin243 Aug 22 '16 at 16:19
  • @tt.net just a clarification, do you want the words in the broken up text to be in the same order as it was pre-broken up? – Ted Aug 22 '16 at 16:27
  • @Ted Yes, they should be in the same order when read left to right, up to down (normal English reading order) – odin243 Aug 22 '16 at 16:35
  • 3
    Are you looking for visual length (when rendered) or string length (number of characters)? – Bergi Aug 22 '16 at 16:55
  • @Bergi Honestly, I would be fine with either. I've been working under the assumption that number of characters would be the easier solution. – odin243 Aug 22 '16 at 16:56
  • @tt.net A rather simple solution would be to try different box widths, let the browser render the text into each, and measure how many lines it takes. – Bergi Aug 22 '16 at 17:01
  • @bergi Yeah, I've thought of that. When I said that optimization wasn't super important to me, I was talking about the problem from the # of chars angle. I could have up to 100 or so of this strings to process at a time, which in pure javascript should be pretty negligible even for an inefficient solution. However, when you add in the DOM access for measuring rendered elements, it seems to be incredibly time consuming. I'll update the question to better specify my performance constraints. – odin243 Aug 22 '16 at 17:12
  • this js measures width of rendered text http://stackoverflow.com/questions/30940499/text-size-auto-resizing-on-div-width/30960832#30960832 – m69's been on strike for years Aug 22 '16 at 18:34
  • @m69 Yeah, that's bascially what I've seen before. I don't have benchmarks handy right now, but the issue seems to be that the offsetWidth and offsetHeight calls are really (unexpectedly to me at least) expensive, so this doesn't work for even a hundred elements very well. I've seen those calls take 50-100ms each at times. – odin243 Aug 22 '16 at 18:38
  • You could limit the number of calls, but even 10 x 100ms is too long, I guess. – m69's been on strike for years Aug 22 '16 at 18:43
  • @tt.net Added code demonstration for the binary search solution to my answer. You may need to adjust it a little to account for the spaces at the end of each line having no length as they are converted to line breaks. (Also the output is unpolished, adding a space after the last word on each line.) – גלעד ברקן Aug 24 '16 at 15:09
  • @tt.net also, can you give us some data to test, maybe a pastebin? – גלעד ברקן Aug 24 '16 at 16:38

6 Answers6

2

Try this. For any reasonable N, it should do the job:

function format(srcString, lines) {
  var target = "";
  var  arr =  srcString.split(" ");
  var c = 0;
  var MAX = Math.ceil(srcString.length / lines);
  for (var i = 0, len = arr.length; i < len; i++) {
     var cur = arr[i];
     if(c + cur.length > MAX) {
        target += '\n' + cur;
     c = cur.length;
     }
     else {
       if(target.length > 0)
         target += " ";
       target += cur;
       c += cur.length;
     }       
   }
  return target;
}

alert(format("this is a very very very very " +
             "long and convoluted way of creating " +
             "a very very very long string",7));
Legionar
  • 7,472
  • 2
  • 41
  • 70
  • Thanks for the suggestion! This is essentially what was suggested on the question I linked to. I have tried adapting traditional word-wrapping algorithms that take in a line length in this way before - unfortunately for me, I know very little about the input text, so, for example, any single word could be longer than length / lines. The most important thing is that the algorithm never use more lines than I specify. Trying your solution with something like this: "this should be three lines, but it has a loooooooooooooooooooooooooooooooong word" shows what I'm talking about. – odin243 Aug 22 '16 at 17:17
  • What do you want to happen when the size of a word exceeds the allocated size for the line? – Milton Hernandez Aug 22 '16 at 18:08
  • That's basically the problem with allocating a set size for each line in the first place. I don't have any specific width that the lines need to be, so algorithms that rely on figuring a line width, and then performing the word-wrapping don't seem to work well for my use-case. – odin243 Aug 22 '16 at 18:39
1

You may want to give this solution a try, using canvas. It will need optimization and is only a quick shot, but I think canvas might be a good idea as you can calculate real widths. You can also adjust the font to the really used one, and so on. Important to note: This won't be the most performant way of doing things. It will create a lot of canvases.

DEMO

var t = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`;


function getTextTotalWidth(text) {
    var canvas = document.createElement("canvas");
    var ctx = canvas.getContext("2d");
  ctx.font = "12px Arial";
    ctx.fillText(text,0,12);
  return ctx.measureText(text).width;
}

function getLineWidth(lines, totalWidth) {
    return totalWidth / lines ;
}

function getAverageLetterSize(text) {
    var t = text.replace(/\s/g, "").split("");
  var sum = t.map(function(d) { 
    return getTextTotalWidth(d); 
  }).reduce(function(a, b) { return a + b; });
    return  sum / t.length;
}

function getLines(text, numberOfLines) {
    var lineWidth = getLineWidth(numberOfLines, getTextTotalWidth(text));
  var letterWidth = getAverageLetterSize(text);
  var t = text.split("");
  return createLines(t, letterWidth, lineWidth);
}

function createLines(t, letterWidth, lineWidth) {
    var i = 0;
  var res = t.map(function(d) {
    if (i < lineWidth || d != " ") {
        i+=letterWidth;
        return d;
    }
    i = 0;
    return "<br />";
  })
  return res.join("");
}

var div = document.createElement("div");
div.innerHTML = getLines(t, 7);
document.body.appendChild(div);
baao
  • 71,625
  • 17
  • 143
  • 203
  • Thanks, I will play with this! I edited the question to be clear that I while I don't care too much about the time efficiency of the algorithm, I do need to be able to do this one - two hundred times on page load, which is why I've stayed away from rendering-based solutions so far. Haven't tried canvas yet though, so maybe this will work better. – odin243 Aug 22 '16 at 17:48
  • Also, looks like this doesn't handle long words very well either (look at my comment on Milton's answer), which is something I've noticed about every solution I've tried of the type - "start on the first line, accumulate until you've passed some calculated size, go to a new line, repeat." (This is just an observation, not necessarily a deal breaker. Looks like this solution WILL respect the input number of lines, which is the most important part) – odin243 Aug 22 '16 at 17:56
  • Yes, that's true. I'll play with it tomorrow again, it would have to check for the longest word too, to get better results. @tt.net – baao Aug 22 '16 at 17:57
  • @tt.net I think something that makes the line length equal to the longest word if it's longer to the avg line width would work good – baao Aug 22 '16 at 17:58
0

(Adapted from here, How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?)

If we consider the word lengths as a list of numbers, we can binary search the partition.

Our max length ranges from 0 to sum (word-length list) + (num words - 1), meaning the spaces. mid = (range / 2). We check if mid can be achieved by partitioning into N sets in O(m) time: traverse the list, adding (word_length + 1) to the current part while the current sum is less than or equal to mid. When the sum passes mid, start a new part. If the result includes N or less parts, mid is achievable.

If mid can be achieved, try a lower range; otherwise, a higher range. The time complexity is O(m log num_chars). (You'll also have to consider how deleting a space per part, meaning where the line break would go, features into the calculation.)

JavaScript code (adapted from http://articles.leetcode.com/the-painters-partition-problem-part-ii):

function getK(arr,maxLength) {
  var total = 0,
      k = 1;

  for (var i=0; i<arr.length; i++) {
    total += arr[i] + 1;

    if (total > maxLength) {
      total = arr[i];
      k++;
    }
  }

  return k;
}
 

function partition(arr,n) {
  var lo = Math.max(...arr),
      hi = arr.reduce((a,b) => a + b); 

  while (lo < hi) {
    var mid = lo + ((hi - lo) >> 1);

    var k = getK(arr,mid);

    if (k <= n){
      hi = mid;

    } else{
      lo = mid + 1;
    }
  }

  return lo;
}

var s = "this is a very very very very "
      + "long and convoluted way of creating "
      + "a very very very long string",
    n = 7;

var words = s.split(/\s+/),
    maxLength = partition(words.map(x => x.length),7);

console.log('max sentence length: ' + maxLength);
console.log(words.length + ' words');
console.log(n + ' lines')
console.log('')

var i = 0;

for (var j=0; j<n; j++){
  var str = '';
  
  while (true){
    if (!words[i] || str.length + words[i].length > maxLength){
      break
    }
    str += words[i++] + ' ';
  }
  console.log(str);
}
Community
  • 1
  • 1
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
0

I'm sorry this is C#. I had created my project already when you updated your post with the Javascript tag.

Since you said all you care about is roughly the same line length... I came up with this. Sorry for the simplistic approach.

    private void DoIt() {

        List<string> listofwords = txtbx_Input.Text.Split(' ').ToList();
        int totalcharcount = 0;
        int neededLineCount = int.Parse(txtbx_LineCount.Text);

        foreach (string word in listofwords)
        {
            totalcharcount = totalcharcount + word.Count(char.IsLetter);
        }

        int averagecharcountneededperline = totalcharcount / neededLineCount;
        List<string> output = new List<string>();
        int positionsneeded = 0;

        while (output.Count < neededLineCount)
        {
            string tempstr = string.Empty;
            while (positionsneeded < listofwords.Count)
            {
                tempstr += " " + listofwords[positionsneeded];
                if ((positionsneeded != listofwords.Count - 1) && (tempstr.Count(char.IsLetter) + listofwords[positionsneeded + 1].Count(char.IsLetter) > averagecharcountneededperline))//if (this is not the last word) and (we are going to bust the average)
                {
                    if (output.Count + 1 == neededLineCount)//if we are writting the last line
                    {
                        //who cares about exceeding.
                    }
                    else
                    {
                        //we're going to exceed the allowed average, gotta force this loop to stop
                        positionsneeded++;//dont forget!
                        break;
                    }
                }
                positionsneeded++;//increment the needed position by one
            }

            output.Add(tempstr);//store the string in our list of string to output
        }

        //display the line on the screen
        foreach (string lineoftext in output)
        {
            txtbx_Output.AppendText(lineoftext + Environment.NewLine);
        }

    }
blaze_125
  • 2,262
  • 1
  • 9
  • 19
0

Using the Java String Split() Method to split a string we will discover How and Where to Apply This String Manipulation Technique:

We'll examine the Java Split() method's explanation and discover how to apply it. The principles are explained simply and with enough programming examples, either as a separate explanation or in the comment part of the programs.

The Java String Split() method is used to divide or split the calling Java String into pieces and return the Array, as the name implies. The delimiters("", " ", ) or regular expressions that we have supplied separately for each component or item of an array.

Syntax

String[ ] split(String regExp)

First Case: It involves initializing a Java String variable with a variety of words separated by spaces, using the Java String Split() method, and evaluating the results. We can effectively print each word without the space using the Java Split() function.

Second Case: In this case, we initialize a Java String variable and attempt to split or deconstruct the main String variable to use the String Split() method utilizing a substring of the initialized String variable.

Third Case: In this case, we will attempt to split a String using its character by taking a String variable (a single word).

You can check out other approaches to this problem on YouTube and even coding websites on google such as Coding Ninjas

  • Welcome to [so]! Please take the [tour], visit the [help] and read up on [answering questions well](https://stackoverflow.com/help/answering). – Scott Sauyet Aug 19 '22 at 15:15
0

This old question was revived by a recent answer, and I think I have a simpler technique than the answers so far:

const evenSplit = (text = '', lines = 1) => {
  if (lines < 2) {return [text]}
  const baseIndex = Math .round (text .length / lines)
  const before = text .slice (0, baseIndex) .lastIndexOf (' ')
  const after = text .slice (baseIndex) .indexOf (' ') + baseIndex
  const index = after - baseIndex < baseIndex - before ? after : before
  return [
    text .slice (0, index), 
    ... evenSplit (text .slice (index + (before > -1 ? 1 : 0)), lines - 1)
  ]
}

const text = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`

const display = (lines) => console .log (lines .join ('\n'))

display (evenSplit (text, 7))
display (evenSplit (text, 5))
display (evenSplit (text, 12))
display (evenSplit (`this should be three lines, but it has a loooooooooooooooooooooooooooooooong word`, 3))
.as-console-wrapper {max-height: 100% !important; top: 0}

It works by finding the first line then recurring on the remaining text with one fewer lines. The recursion bottoms out when we have a single line. To calculate the first line, we take an initial target index which is just an equal share of the string based on its length and the number of lines. We then check to find the closest space to that index, and split the string there.

It does no optimization, and could certainly be occasionally misled by long words, but mostly it just seems to work.

Scott Sauyet
  • 49,207
  • 4
  • 49
  • 103