-1

EXAMPLE

REGEX:

.replace(/((<)(\/|)([a-zA-Z-Z0-9]+))/gi,'\n$1')

What does this do?

INPUT:

<div id="page"><div id="header"><h1><a href="#">Burger Pointer</a></h1><ul class="left"><li><a href="#">Menu</a></li><li><a href="#">Location</a></li><li><a href="#">About Us</a></li><li><a href="#">BP Gear</a></li></ul></div></div>

OUTPUT:

<div id="page">
<div id="header">
<h1>
<a href="#">Burger Pointer
</a>
</h1>
<ul class="left">
<li>
<a href="#">Menu
</a>
</li>
...

QUESTION

Is there a way to check if group 1, 4th capturing group is NOT a|h1|etc... using regexes so the output would be:

<div id="page">
<div id="header">
<h1><a href="#">Burger Pointer</a></h1>
<ul class="left">
<li>
<a href="#">Menu</a>
</li>
...

PROGRESS

Not currently working, see example here

.replace(/(<|<\/)([a-zA-Z-Z0-9]+)/gi,function($0, $1, $2) {
   if (["h1","a"].indexOf($2)) {
      return "$0"
    } else {
        return "/n$1$2"
    }
})
Tyler
  • 854
  • 1
  • 10
  • 26
  • 4
    [Don't parse HTML with regex!](http://stackoverflow.com/a/1732454/418066) – Biffen Feb 13 '17 at 10:33
  • I've seen this before @Biffen, not an answer though I'm afraid. What I am doing so far has worked amazingly well. – Tyler Feb 13 '17 at 10:35
  • 1
    It's not supposed to be an answer (which is why I didn't post it as an answer). And while parsing regex might work for a while, it'll stop working once the input becomes more complex. It having worked in the past is absolutely no guarantee it will in the future. – Biffen Feb 13 '17 at 10:37
  • A hint: you may use a callback as the second argument to `replace` and apply custom replacement behavior depending on the group values. – Wiktor Stribiżew Feb 13 '17 at 10:42
  • @WiktorStribiżew something like this? `.replace(/(<|<\/)([a-zA-Z-Z0-9]+)/gi, function($1, $2) { if ($2 != "this|that") { ... } } );` – Tyler Feb 13 '17 at 17:38
  • Yeah, `function ($0, $1, $2() { if $2=="this" || $2 == "that") {return ...})` or whatever. – Wiktor Stribiżew Feb 13 '17 at 17:47
  • @WiktorStribiżew [here](http://codepen.io/MrTIMarshall2512/pen/egxmMK) is my attempt so far... Not going great! – Tyler Feb 13 '17 at 18:44
  • 1
    Why use something that will end up being very complex and not even guaranteed to work when you could build something simple and straightforward that would always work? – Jan Feb 13 '17 at 18:57
  • This is a learning curve and I have advanced so far via using regexes, you could possibly submit an answer with a better way around doing this @Jan? – Tyler Feb 13 '17 at 19:20
  • Okay, so from the first comment, that leads to suggesting using an XML parser. Do I parse **ALL** languages this way or just HTML? – Tyler Feb 13 '17 at 20:21
  • When you use DOM parsing, it will be more stable and safer. Now, your code is wrong as `$0` and `$1` are variables, but you use them as string literals. Use `if (["h1","a"].indexOf($2)) { return $0; } else { return "\n"+$1+$2; }` – Wiktor Stribiżew Feb 13 '17 at 20:57
  • @TimMarshall I've posted a possible solution – Jan Feb 13 '17 at 23:27
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Jan Feb 17 '17 at 08:12

2 Answers2

1

If I've understood your problem correctly you want to remove linebreaks inside elements of certain tags. One way to do this correctly is to convert it to HTML then manipulate the tags. To do that you can create a temporary HTML element and inject your HTML into it.

You'll notice that apart from removing the linebreaks, this method will also close your div tags, since the HTML you provided is invalid.

This isn't a complete solution or a neat architecture, just a proof of concept of how this type of problem could be solved.

Supplying a pure javascript and a jquery version (since you specify jquery even though you have no jquery code). To find out what the individual commands do, read up on them in the jquery documentation or MDN reference.

jQuery

var temporaryElement = $("<body />").html(inputString);

temporaryElement.find("h1, a").each(function() {
    $(this).html($(this).html().replace(/\n/g, "")));
}

console.log(temporaryElement.html());

Pure Javascript

var inputString = `<div id="page">
<div id="header">
<h1>
<a href="#">Burger Pointer
</a>
</h1>
<ul class="left">
<li>
<a href="#">Menu
</a>
</li>`;

function removeLinebreaksInTag(parent, tagName) {
    var elements = parent.getElementsByTagName(tagName);
    for (var i = 0 ; i < elements.length ; i++) {
        elements[i].innerHTML = elements[i].innerHTML.replace(/\n/g, "");
    }
}

function cleanUpHtml(html) {
    var temporaryElement = document.createElement("body");
    temporaryElement.innerHTML = html;

    removeLinebreaksInTag(temporaryElement, "h1");
    removeLinebreaksInTag(temporaryElement, "a");

    return temporaryElement.innerHTML;
}

console.log(cleanUpHtml(inputString));
Jan
  • 5,688
  • 3
  • 27
  • 44
  • Both answers are somewhat sufficient with a little playing around to get the desired outcome. If you see my [updated example on CodePen](http://codepen.io/MrTIMarshall2512/pen/egxmMK?editors=0010#0) I've woken up today and have been working on a work around. All seems to be good apart from mid-way there is a `` which doesn't decrease my indent. jQuery is much preferred to be used as I don't know Pure JS, could you maybe with your knowledge compare our dis/advantages? – Tyler Feb 14 '17 at 10:37
  • Also please see my [live testing page](http://admin.bananza.org.uk/Assets/Plugins/syntax-highlight/demo.html) to as why I've been somewhat reluctant to switch my methods around creating my plugin at this point.All perfectly works, I am just trying to create a minified version view and also tidy inputted code. – Tyler Feb 14 '17 at 10:42
  • Not sure if when I save my CodePen it modifies the URL however now I am at the desired goal if you have a look at my [updated example](http://codepen.io/MrTIMarshall2512/pen/egxmMK?editors=0010), – Tyler Feb 14 '17 at 11:24
  • "All seems to work" yes with the examples you've used so far. If I were you I'd rather build something stable that's designed to always work. And yes, the large complexity that you've created is why you're reluctant to change it, because it's highly complex and you yourself probably barely understand how it works at this point. If you haven't used jquery maybe you'd want to build your first test applications in pure js, just to learn the language a bit. I mean jquery is a tool and has its place, but most things can be done easily without it. – Jan Feb 15 '17 at 07:47
  • Oh I do understand what I'm doing very much so and I love using jQuery, I do not know Pure JS, well, very limitedly. This is mainly just for fun at the moment whereas everything has been working, I've been testing on somewhat large files and there has only been minor bugs which I've patched up within minutes and my results have been the same as [http://jsbeautifier.org/](http://jsbeautifier.org/). Maybe there'll come a point when I cannot progress but I'll cross that bridge when I come to it. Thank you and sorry if you feel I am stubborn! – Tyler Feb 15 '17 at 21:38
  • I mean, if you really knew what you were doing we wouldn't be having this conversation. Because a) you wouldn't be trying to do something you shouldn't. b) your solution wouldn't be so complex that you need outside help to be able to develop it c) using the proper way you'd have access to documentation and examples to answer whatever questions may arise. – Jan Feb 16 '17 at 11:51
  • Say what you want, I am now far beyond this stage. I thought I needed aid with this stage however like I commented above, I already solved this myself. [Here](http://codepen.io/MrTIMarshall2512/pen/EZrGNE) is where I am currently at, slow progress however I've not been placing much time into this right now so that's expected. My code is nice and simple, easy for me to understand and tested frequently to patch up if I have not included for specific HTML I wish to display differently. – Tyler Feb 16 '17 at 12:03
  • Can you please answer one question? - Do I parse **ALL** languages or just HTML? And is [this Fiddle](http://jsfiddle.net/dyzda1nx/) a good start? – Tyler Feb 16 '17 at 12:25
  • If you have a new question you should ask a new question. There's also http://softwareengineering.stackexchange.com for questions that are outside the scope of SO – Jan Feb 16 '17 at 13:06
  • Your code breaks the logic of the inline script http://codepen.io/anon/pen/bgJqMd – Jan Feb 16 '17 at 13:18
  • 1
    If you're ever interested in doing it the proper way, this is a proof of concept of how your logic could be done in 30 lines of readable, maintainable and consistently working code https://jsbin.com/mutuxumavi/1/edit?js,console – Jan Feb 17 '17 at 01:42
0

From your examples, you need to

  • capture <a> <h1> tag but don't catch </a> and </h1> tag (since in your output there is a newline before <h1> and<a> tags.

you can achieve it with Negative Lookahead .

The Regex is (?!<\/a|<\/h1)((<)(\/|)([a-zA-Z-Z0-9]+))

You can find a demo here

Input is

<!-- Comments Testing -->
<div id="page"><div id="header"><h1><a href="#">Burger Pointer</a></h1><ul class="left"><li><a href="#">Menu</a></li><li><a href="#">Location</a></li><li><a href="#">About Us</a></li><li><a href="#">BP Gear</a></li></ul></div></div>

Output is

<!-- Comments Testing -->

<div id="page">
<div id="header">
<h1>
<a href="#">Burger Pointer</a></h1>
<ul class="left">
<li>
<a href="#">Menu</a>
</li>
<li>
<a href="#">Location</a>
</li>
<li>
<a href="#">About Us</a>
</li>
<li>
<a href="#">BP Gear</a>
</li>
</ul>
</div>
</div>

The issue is it also captures <a> inside <h1> tag. Since javascript doesn't support lookbehinds, I cant find a way to eliminate these matches.

If you want to negate all <a> and <h1> tags like you asked in ur question then you can try this regex ((<)(\/|)(?!a|h1)([a-zA-Z0-9]+))

The output for this would be

<!-- Comments Testing -->

<div id="page">
<div id="header"><h1><a href="#">Burger Pointer</a></h1>
<ul class="left">
<li><a href="#">Menu</a>
</li>
<li><a href="#">Location</a>
</li>
<li><a href="#">About Us</a>
</li>
<li><a href="#">BP Gear</a>
</li>
</ul>
</div>
</div>

you can find the demo here

Abdul Hameed
  • 1,025
  • 12
  • 27