1

I am searching for the title and description in my application and the task is to shorten search results. For example, if the description is too long (more than 2 lines) the result should be shortened to one or two lines of text with a found word highlighted.

Here's the example from algolia: enter image description here

Here's what I've tried so far, but it's not working as expected:

  const truncateHighlightedText = (
    sentence,
    searchExpression,
    truncateLength
  ) => {
    const pattern = new RegExp(
      '\\b.{1,' +
        truncateLength +
        '}\\b' +
        searchExpression +
        '\\b.{1,' +
        truncateLength +
        '}\\b',
      'i'
    );

    return sentence.match(pattern);
  };


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 20;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

https://jsfiddle.net/dwr3qgs0/1/

What can be the best approach for this task?

lecham
  • 2,294
  • 6
  • 22
  • 41
  • What sort of match were you hoping for given the sentence there? The only standalone `eg` is at the very end of the string, which doesn't match because the `.{1` requires at least one character afterwards – CertainPerformance Apr 17 '20 at 10:47
  • @CertainPerformance I was hoping for searchExpression and a few words from both sides of this word. e.g. Testsdfgbs`egseg`srewgserwfgvsrevfse ewrwer – lecham Apr 17 '20 at 10:54
  • Even though the `eg` isn't a standalone word there? – CertainPerformance Apr 17 '20 at 11:00
  • @CertainPerformance It's pretty ok, cause I wanted to add '...' from both sides of the line, but it will be also great to show standalone words – lecham Apr 17 '20 at 11:47

1 Answers1

1

Your code currently doesn't match anything for 2 reasons:

  • You're using word boundaries with \b, which means that only a match for the standalone word will work. In the code in your question, egseg is not a standalone word anywhere. In the code in the fiddle, eg is a standalone word, but it exists at the very end of the string
  • You're requiring at least one character before and after the matched word with your {1,' + truncateLength + '}. This is why, in the fiddle, the eg isn't matched.

If you want to match the searchExpression anywhere, remove the word boundaries, and use {0,, not {1,, in case the match is at the beginning or end of the string:

const truncateHighlightedText = (
  sentence,
  searchExpression,
  truncateLength
) => {
  const pattern = new RegExp(
    '\\b.{0,' +
    truncateLength +
    '}' +
    searchExpression +
    '.{0,' +
    truncateLength +
    '}\\b',
    'i'
  );
  console.log(pattern)

  return sentence.match(pattern);
};


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 30;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

To add ...s to the ends which contain additional unshown characters, optionally capture a character before and after the match in lookaround tokens and add ...s if they've captured anything:

const truncateHighlightedText = (
  sentence,
  searchExpression,
  truncateLength
) => {
  const pattern = new RegExp(
    '(?<=(.)?)\\b.{0,' +
    truncateLength +
    '}' +
    searchExpression +
    '.{0,' +
    truncateLength +
    '}\\b(?=(.)?)',
    'i'
  );
  const match = sentence.match(pattern);
  return (match[1] ? '...' : '') + match[0] + (match[2] ? '...' : '');
};


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 30;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);

Without lookbehind, use capturing groups everywhere instead of the match[0]:

const truncateHighlightedText = (
  sentence,
  searchExpression,
  truncateLength
) => {
  const pattern = new RegExp(
    '(.)?(\\b.{0,' +
    truncateLength +
    '}' +
    searchExpression +
    '.{0,' +
    truncateLength +
    '})\\b(.)?',
    'i'
  );
  const match = sentence.match(pattern);
  return (match[1] ? '...' : '') + match[2] + (match[3] ? '...' : '');
};


const sentence = "Testsdfgbsegsegsrewgserwfgvsrevfse  ewrwer wergwregew    erwgrewgwerg   erwgwr eerg rg g er egr ew  erger  rtggrt tr ert tr tr tg tgr gtr  gtr egrt rtg trg rg e eg";
const searchExpression = "egseg";
const truncateLength = 30;


const result = truncateHighlightedText(sentence, searchExpression, truncateLength);
console.log(result);
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Also, what's the right way to add `...`? In your example, it should be added at the end of the string. If the sentence is cut from both ends, then the `...` should be at both ends pf a string. – lecham Apr 17 '20 at 13:13
  • See edit, capture in a lookaround, then examine the match array – CertainPerformance Apr 17 '20 at 13:21
  • Just want to clarify, is it possible to make this regexp cross-browser? As I see, look-back won't work in older browser and safari https://stackoverflow.com/questions/51568821/works-in-chrome-but-breaks-in-safari-invalid-regular-expression-invalid-group?noredirect=1&lq=1 – lecham Apr 17 '20 at 16:24
  • Without lookbehind, use capturing groups everywhere to extract the substrings – CertainPerformance Apr 17 '20 at 20:21