0

I have the following function that performs multiple replace actions on a single string inputHtml. It works well but takes too long. Is it possible to speed it up by combining them?

/* Receives HTML code and returns the plain text contained in the HTML code */
function decodeHtml(inputHtml) {
  const commentsRemoved = inputHtml.replace(/<!--[\s\S]*?-->/gm, '');
  const linebreaksAdded = commentsRemoved.replace(/<br>/gm, '\n');
  const tagsRemoved = linebreaksAdded.replace(/<(?:.|\n)*?>/gm, '');
  const linebreaksRemoved = tagsRemoved.replace(/^\s*[\r\n]/gm, '');
  const plainText = entities.decode(linebreaksRemoved);

  return plainText;
}

1 Answers1

0

Since you're doing some replacements with newline, to make it into a single
pass regex, you have to do a little combining of functionality.

Regex explained

    ( <!-- [\s\S]*? --> )         # (1), return ''
 |  
    (?:                           # Blank lines, simulate ^ multiline
         ( \r? \n )                    # (2), return $2
      |  (                             # (3 start)
              ( \r? )                       # (4), return $4 + '\n'
              <br> 
         )                             # (3 end)
    )
    (?: \s | <br> | <!-- [\s\S]*? --> )*
    \r? 
    (?: \n | <br> )
 |  
    ( <br> )                      # (5), return '\n'
 |  
    ( < [\s\S]*? > )              # (6), return ''

JS code

var input = 'here<br>   <br> <br> <br><!-- <br> --> <br><br><br><br>and here<br>and there ';

var output = input.replace(/(<!--[\s\S]*?-->)|(?:(\r?\n)|((\r?)<br>))(?:\s|<br>|<!--[\s\S]*?-->)*\r?(?:\n|<br>)|(<br>)|(<[\s\S]*?>)/g,
     function(m,p1,p2,p3,p4,p5,p6) {
       if ( p1 || p6 )
           return "";
       // 
         if ( p2 )
             return p2;
         if ( p3 )
             return  p4 + "\n";
       //
       if ( p5 )
           return "\n";
       });
       
console.log(output);

Input

here<br>   <br> <br> <br><!-- <br> --> <br><br><br><br>and here<br>and there 

Output

here
and here
and there