Truncate Text with Pattern

Question

I want to truncate text in a pattern, this is a function to highlight text from an array containing matched indexes and text, but I want to truncate the text which doesn't include the part with match, see code below

const highlight = (matchData, text) => {
  var result = [];
  var matches = [].concat(matchData);
  var pair = matches.shift();

  for (var i = 0; i < text.length; i++) {
    var char = text.charAt(i);
    if (pair && i == pair[0]) {
      result.push("<u>");
    }

    result.push(char);
    if (pair && i == pair[1]) {
      result.push("</u>");
      truncatedIndex = i;
      pair = matches.shift();
    }
  }
  return result.join("");
};

console.log(
  highlight(
    [[23, 29], [69, 74]],
    "Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
  )
);

// This returns the highlighted HTML - Result will be => "Some text that doesn't <u>include</u> the main thing, the main thing is the <u>result</u>, you may know I meant that"

But this returns whole text, I want to truncate other texts in the range, I want to truncate other text but not in range of 20 characters before and after the result so the text can be clean as well as understandable. Like

"... text that doesn't <u>include</u> the main thing ... the <u>result</u> you may know I ..."

I can't find out a way to make that. Help is appreciated.

In my understanding you want range text as truncate text right? or please update what is output of your program and what you expected. — mariappan k, Oct 09 '19 at 16:14
Not an answer, but I suggest using unique variable names; you've got `matches` as an argument name and a variable name. It doesn't matter in this case, since you don't use the `matches` argument after shadowing it with the `matches` variable, but in other situations you may experience issues. — Heretic Monkey, Oct 09 '19 at 16:15
I've added a [Stack Snippet](https://meta.stackoverflow.com/q/358992/215552) so that others can more easily reproduce your issue. However, it currently throws an error. Perhaps you can look into that? I'm guessing it has to do with `match.indices`, where `indices` doesn't exist on `[23, 30]`. — Heretic Monkey, Oct 09 '19 at 16:46
What is the rule that makes `" the main thing is the "` break into `" the main thing ... the "` instead of `" the main things is ... the "`, or `" the ... main thing is the "` or even `" the main thing is ... main thing is the "`? The requirements are not clear to me. — Scott Sauyet, Oct 09 '19 at 18:34

score 1 · Accepted Answer · answered Oct 09 '19 at 18:00

I've modified your function considerably, to make it easier to understand, and so that it works...

Instead of using an array of arrays, which I find cumbersome to deal with, I modified to use an array of objects. The objects are simple:

{
  start: 23,
  end: 30
}

Basically, it just adds names to the indices you had previously.

The code should be relatively easy to follow. Here's a line-by-line explanation:

Armed with the new structure, you can use a simple substring command to snip the appropriate piece of text.
Since we're in a loop, and I don't want two sets of ellipses between matches, I check to see if we're on the first pass through and only add an ellipses before the match on the first pass.
The text before the piece we've snipped is the 20 characters before the start of the match, or the number of characters to the beginning of the string. Math.max() provides a easy way of getting the highest index available.
The text after the piece we've snippet is the 20 characters after the end of the match, or the number of characters to the end of the string. Math.min() provides a easy way of getting the lowest index available.
Concatentating them together, we get the match's new text. I'm using template literals to make that easier to read than a bunch of + " " + and whatnot.

const highlight = (matches, text) => {
  let newText = '';
  matches.forEach((match) => {
    const piece = text.substring(match.start, match.end);
    const preEllipses = newText.length === 0 ? '... ' : '';
    const textBefore = text.substring(Math.max(0, match.start - 20), match.start);
    const textAfter = text.substring(match.end, Math.min(text.length - 1, match.end + 20));
    newText += `${preEllipses}${textBefore}<u>${piece}</u>${textAfter} ... `;
  });
  return newText.trim();
}

// Sample Usage
const result = highlight([{ start: 23,  end: 30 }, { start: 69, end: 75 }], "Some text that doesn't include the main thing, the main thing is the result, you may know I meant that");
console.log(result);
document.getElementById("output").innerHTML = result;
// Result will be => "... e text that doesn't <u>include</u> the main thing, the ... e main thing is the <u>result</u>, you may know I mea ..."

<div id="output"></div>

Note that I am using simple string concatenation here, rather than putting parts into an array and using join. Modern JavaScript engines optimize string concatenation extremely well, to the point where it makes the most sense to just use it. See e.g., Most efficient way to concatenate strings in JavaScript?, and Dr. Axel Rauschmayer's post on 2ality.

Scott Sauyet · Answer 2 · 2019-10-10T14:57:53.160

Note

There's an update below that I think shows a better version of this same idea. But this is where it started.

Original Version

Here's another attempt, building a more flexible solution out of reusable parts.

const intoPairs = (xs) =>
  xs .slice (1) .map ((x, i) => [xs[i], x])

const splitAtIndices = (indices, str) => 
  intoPairs (indices) .map (([a, b]) => str .slice (a, b))

const alternate = Object.assign((f, g) => (xs, {START, MIDDLE, END} = alternate) => 
  xs .map (
    (x, i, a, pos = i == 0 ? START : i == a.length - 1 ? END : MIDDLE) => 
      i % 2 == 0 ? f (x, pos) : g (x, pos)
  ),
  {START: {}, MIDDLE: {}, END: {}}
)


const wrap = (before, after) => (s) => `${before}${s}${after}`

const truncate = (count) => (s, pos) =>
  pos == alternate.START
    ? s .length <= count ? s : '... ' + s .slice (-count)
  : pos == alternate.END
    ? s .length <= count ? s : s .slice (0, count) + ' ...'
  : // alternate.MIDDLE
    s .length <= (2 * count) ? s : s .slice (0, count) + ' ... ' + s .slice (-count)



const highlighter = (f, g) => (ranges, str, flip = ranges[0][0] == 0) => 
  alternate (flip ? g : f, flip ? f : g) (
    splitAtIndices ([...(flip ? [] : [0]), ...ranges .flat() .sort((a, b) => a - b), str.length], str)
  ) .join ('')

const highlight = highlighter (truncate (20), wrap('<u>', '</u>'))

#output {padding: 0 1em;} #input {padding: .5em 1em 0;} textarea {width: 50%; height: 3em;} button, input {vertical-align: top; margin-left: 1em;}

<div id="input">  <textarea id="string">Some text that doesn't include the main thing, the main thing is the result, you may know I meant that</textarea>  <input type="text" id="indices" value="[23, 30], [69, 75]"/>  <button id="run">Highlight</button></div><h4>Output</h4><div id="output"></div>    <script>document.getElementById('run').onclick = (evt) => {  const str = document.getElementById('string').value;  const idxString = document.getElementById('indices').value;  const idxs = JSON.parse(`[${idxString}]`);  const result = highlight(idxs, str);  console.clear();  document.getElementById('output').innerHTML = '';  setTimeout(() => {    console.log(result);    document.getElementById('output').innerHTML = result;  }, 300)}</script>

This involves the helper functions intoPairs, splitAtIndices alternate, wrap and truncate. I think they are best show by examples:

intoPairs (['a', 'b', 'c', 'd']) //=> [['a', 'b'], ['b', 'c'], ['c', 'd']]

splitAtIndices ([0, 3, 7, 15], 'abcdefghijklmno') //=> ["abc", "defg", "hijklmno"]
                           //   ^ ^   ^       ^         `---'  `----'  `--------'
                           //   | |   |       |           |       |         |
                           //   0 3   7      15         0 - 3   4 - 7     8 - 15

alternate (f, g) ([a, b, c, d, e, ...]) //=> [f(a), g(b), f(c), g(d), f(e), ...]

wrap ('<div>', '</div>') ('foo bar baz') //=> '<div>foo bar baz</div>

//chars---+   input---+             position---+           output--+
//        |           |                        |                   |
//        V           V                        V                   V
truncate (10) ('abcdefghijklmnop',           ~START~)  //=> '... ghijklmnop'
truncate (10) ('abcdefghijklmnop',           ~END~)    //=> 'abcdefghij ...'
truncate (10) ('abcdefghijklmnop',           ~MIDDLE~) //=> 'abcdefghijklmnop'
truncate (10) ('abcdefghijklmnopqrstuvwxyz', ~MIDDLE~) //=> 'abcdefghij ... qrstuvwxyz'

All of these are potentially reusable, and I personally have intoPairs and wrap in my general utility library.

truncate is the only complex one, and that is mostly because it does triple duty, handling the first string, the last string, and all the others in three distinct manners. You first supply a count and the you give a string as well as the position (START, MIDDLE, END, stored as properties of alternate.) For the first string, it includes an ellipsis (...) and the last count characters. For the last one, it includes the first count characters and an ellipsis. For the middle ones, if the length is shorter than double count, it returns the whole thing; otherwise it includes the first count characters, an ellipsis and the last count characters. This behavior might be different from what you want; if so,

The main function is highlighter. It accepts two functions. The first one is how you want to handle the non-highlighted sections. The second is for the highlighted ones. It returns the style function you were looking for, one that accepts an array of two-element arrays of numbers (the ranges) and your input string, returning a string with the highlighted ranges and the non-highlighted ranges.

We use it to generate the highlight function by passing it truncate (20) and wrap('<u>', '</u>').

The intermediate forms might make it clearer what's going on.

We start with these indices:

[[23, 30], [69, 75]]]

and our 103-character string,

"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"

First we flatten the ranges, prepending a zero if the first range doesn't start there and appending the last index of the string, to get this:

[0, 23, 30, 69, 75, 102]

We pass that to splitAtIndices, along with our string, to get

[
    "Some text that doesn't ",
    "include",
    " the main thing, the main thing is the ",
    "result",
    ", you may know I meant that"
]

Then we map the appropriate functions over each of these strings to get

[
    "... e text that doesn't ",
    "<u>include</u>",
    " the main thing, the main thing is the ",
    "<u>result</u>",
    ", you may know I mea ..."
]

and join those together to get our final results:

"... e text that doesn't <ul>include</ul> the main thing, the main thing is the <ul>result</ul>, you may know I mea ..."

I like the flexibility this offers. It's easy to alter the highlight strategy as well as how you handle the unhighlighted parts -- just pass a different function to highlighter. It's also a useful breakdown of the work into reusable parts.

But there are two things I don't like.

First, I'm not thrilled with the handling of middle unhighlighted sections. Of course it's easy to change; but I don't know what would be appropriate. You might, for instance, want to change the doubling applied to the count there. Or you might have an entirely different idea.

Second, truncate is dependent upon alternate. We have to somehow pass signals from alternate to the two functions supplied to it to let them know where we are. My first pass involved passing the index and the entire array (the Array.prototype.map signature) to those functions. But that felt too coupled. We could make START, MIDDLE, and END into module-local properties, but then alternate and truncate would not be reusable. I'm not going to go back and try it now, but I think a better solution might be to pass four functions to highlighter: the function for the highlighted sections, and one each for start, middle, and end positions of the non-highlighted ones.

Update

I did go ahead and try that alternative I mentioned, and I think this version is cleaner, with all the complexity located in the single function highlighter:

const intoPairs = (xs) =>
  xs .slice (1) .map ((x, i) => [xs[i], x])

const splitAtIndices = (indices, str) => 
  intoPairs (indices) .map (([a, b]) => str .slice (a, b))

const wrap = (before, after) => (s) => `${before}${s}${after}`

const truncateStart = (count) => (s) =>
  s .length <= count ? s : '... ' + s .slice (-count)

const truncateMiddle  = (count) => (s) =>
  s .length <= (2 * count) ? s : s .slice (0, count) + ' ... ' + s .slice (-count)

const truncateEnd  = (count) => (s) =>
  s .length <= count ? s : s .slice (0, count) + ' ...'

const highlighter = (highlight, start, middle, end) => 
  (ranges, str, flip = ranges[0][0] == 0) => 
    splitAtIndices ([...(flip ? [] : [0]), ...ranges .flat() .sort((a, b) => a - b), str.length], str)
      .map (
        (s, i, a) =>
          (flip 
             ? (i % 2 == 0 ? highlight : i == a.length - 1 ? end : middle)
             : (i == 0 ? start : i % 2 == 1 ? highlight : i == a.length - 1 ? end : middle)
          ) (s)
      ) .join ('')                  

const highlight = highlighter (
  wrap('<u>', '</u>'),
  truncateStart(20),
  truncateMiddle(20),
  truncateEnd(20)
)

console .log (
  highlight (
    [[23, 30], [69, 75]], 
    "Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
  )
)
console .log (
  highlight (
    [[23, 30], [86, 92]], 
    "Some text that doesn't include the main thing, because you see, the main thing is the result, you may know I meant that"
  )
)

There is some real complexity built into highlighter, but I think it's fairly intrinsic to the problem. On each iteration, we have to choose one of our four functions based on the index, the length of the array, and whether the first range started at zero. This bit here simply chooses the function based on all that:

(flip 
   ? (i % 2 == 0 ? highlight : i == a.length - 1 ? end : middle)
   : (i == 0 ? start : i % 2 == 1 ? highlight : i == a.length - 1 ? end : middle)
)

where the flip boolean simply reports whether the first range starts at 0, a is the array of substrings to handle., and i is the current index in the array. If you see a cleaner way of choosing the function, I'd love to know.

If we wanted to write a gloss for this sort of highlighting, we could easily write

const truncatingHighlighter = (count, start, end) => 
  highlighter (
    wrapp(start, end),
    truncateStart(count),
    truncateMiddle(count),
    truncateEnd(count)
  )

const highlight = truncatingHighlighter (20, '<u>', '</u>')

I definitely think this is a superior solution.

Truncate Text with Pattern

2 Answers2

Note

Original Version

Update