0

What I'm looking for

I have code like this:

<div this-html="text goes here"></div>

After a regex, I want the value of attribute "this-html" be the text between the opening and closing tag, like so:

<div>text goes here</div>

The element can contain other attributes and doesn't have to be a div, it can basically be any other type of element, as long as it uses a closing tag (which doesn't have to be on the same line). It's also possible that the input has text between the tags, like so <div this-html="text goes here">dummy text</div>, but that can be ignored / should be overwritten with the value of the "this-html" attribute.

What I have

I can't use jQuery or turn the string into a Javascript object, as it may contain PHP (which will then get crippled if you turn it back into a string again). This script is used during a 'publish to html' process of an application, hence it can contain PHP. And so, I'm trying to solve it using regular expressions.

So basically, all I have is Javascript and the HTML I need to work with is just a string, there's no DOM to work with.

Now, I have a regular expression that does this for me, but it doesn't work when you have multiple matches on the same line or when I have another attribute after "this-html".

This is the regex I'm using: /(<\s*[^<]+?)this-html=['"]{1}(.+)['"]{1}([^>]*>)[\w\W]*?(<\/.+>)/gmi

And I group it back together with $1$3$2$4.

Now, let's say I have the following input: <div this-html="text goes here!" class="something">test</div><div this-html="another test">Option is visible on preview/publish</div>

Then my regex pattern will mess this up and I end up with something like this: <div >text goes here!" class="something">test</div><div this-html="another test</div>

I'm not a regex guru, but I get the feeling this regex could be a whole lot simpler, but I'm stuck here.

Any ideas?

witsec
  • 27
  • 4
  • 1
    1. [regex is not suitable to parse HTML](https://stackoverflow.com/a/1732454/5734311) 2. you don't have to do string composing for tasks like this; an HTML document is parsed into the DOM 3. use `element.innerHTML = element.getAttribute('this-html')` 4. please clarify what you by "may contain PHP" –  Mar 06 '21 at 12:47
  • @ChrisG, I edited my post a little. I'm using this in a 'publish-to-html' process in an application. Javascript is available there, but the HTML I need to work with is just a string there, there's no DOM. – witsec Mar 06 '21 at 12:56
  • 2
    See https://regex101.com/r/fQtnmo/2, `text.replace(/(<\s*(\w+)[^<]*?)\s+this-html=['"]([^"']*)['"]([^>]*?)\s*>[\w\W]*?(<\/\2>)/gi, '$1$4>$3$5')` – Wiktor Stribiżew Mar 06 '21 at 13:07
  • 2
    You can create a virtual DOM based on the string, run the code on the elements, then turn the result back to HTML. –  Mar 06 '21 at 13:10
  • You might want to look at templating systems, e.g. handlbars or pug or twig. – geoidesic Mar 06 '21 at 13:13
  • @WiktorStribiżew, yes! This seems to work really well for me. Thank you so much! – witsec Mar 06 '21 at 13:18
  • 2
    Don't use that, seriously. Use this: https://jsfiddle.net/jymr20gp/ –  Mar 06 '21 at 13:20
  • @ChrisG I have tried that (and I'd prefer that!), but the string of HTML can contain PHP. If you put that entire string into a DOM object and (once done) back into a string, the PHP tags are crippled and I'd have to fix that up again. It's a world of pain. – witsec Mar 06 '21 at 13:23
  • That's true, who designed this system...!? :) –  Mar 06 '21 at 13:24

2 Answers2

1

This is the correct way of doing what you want:

function convert(html) {
  var div = document.createElement('div');
  div.innerHTML = html;
  div.querySelectorAll('*').forEach(el => {
    var h = el.getAttribute('this-html');
    if (h) {
      el.innerHTML = h;
      el.removeAttribute('this-html');
    }
  });
  return div.innerHTML;
}


var html = '<div this-html="text goes here!" class="something">test</div><div this-html="another test">Option is visible on preview/publish</div>';

console.log( convert(html) );

However, as your environment does not allow you to use DOM, you might resort to regex like

text.replace(/(<\s*(\w+)[^<]*?)\s+this-html=['"]([^"']*)['"]([^>]*?)\s*>[\w\W]*?(<\/\2>)/gi, '$1$4>$3$5')

See the regex demo. NOTE: once it is possible to use DOM, please switch to the solution, and not this workaround.

Details

  • (<\s*(\w+)[^<]*?) - Group 1 ($1 value): <, zero or more whitespaces, Group 2 ($2 value): any one or more word chars, then any zero or more chars other than < as few as possible
  • \s+ - one or more whitespace
  • this-html= - literal text
  • ['"] - a " or '
  • ([^"']*) - Group 3 ($3 value): zero or more chars other than " and '
  • ['"] - a " or '
  • ([^>]*?) - Group 4 ($4 value): any zero or more chars other than > as few as possible
  • \s* - zero or more whitespace
  • > - a > char
  • [\w\W]*? - any zero or more chars, as few as possible
  • (<\/\2>) - Group 5 ($5 value): </, same value as in Group 2, >.

See the JavaScript demo:

var text = '<div this-html="text goes here!" class="something">test</div><div this-html="another test">Option is visible on preview/publish</div>';
console.log( text.replace(/(<\s*(\w+)[^<]*?)\s+this-html=['"]([^"']*)['"]([^>]*?)\s*>[\w\W]*?(<\/\2>)/gi, '$1$4>$3$5') );
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you for your comprehensive answer, it's much appreciated! Until I get the chance to implement a proper solution, this will have to do. – witsec Mar 09 '21 at 12:03
0

I don't know what the PHP code contained in the string will look like, but can something like this be fine? :)

var regex = /<\s*(\w+)([^>]*)\s*this-html=\"([^"]*)\"([^>]*)>[^<]*<\s*\/\s*\w+\s*>/gi;
var stringTest = '<div this-html="text goes here!" class="something">test</div><div this-html="another test">Option is visible on preview/publish</div>';
var result = stringTest.replace(regex,'<$1$2$4>$3</$1>');
alert(result);
Pinguto
  • 416
  • 3
  • 17