Showing source of one
in another with a regular expression

Question

Ok I give up and would really appreciate it if you guys could cast their eye over this for me? I'll try not to ramble.

Goal is to have a 'rendered-view' showing 'rendered' HTML (clickable links) and 'source-view' showing the actual HTML of whatever is in the rendered view. When a link is clicked it is made not a link any more and simply becomes the anchor text. The source then needs to update to reflect this. That's the idea anyway.

The rendered links have an id, href, title and class attributes plus a <strong> tag. I don't want the id, title or class to show in the source view.

I have it so upon clicking, the 'href' and <strong> are removed, then I remove the class and title from ALL links. I need to keep the id in source view so the undo function I've created still works.

So, the problem is basically:

<div> with id of 'rendered-view' contains the rendered version of:

<a id="link1">blah blah</a>
<a id="link2" href="http://www.somesite.com"><strong>Visit this site</strong></a>

i.e:

blah blah

Visit this site

Source view should result in:

blah blah
<a href="http://www.somesite.com"><strong>Visit this site</strong></a>

I know a regular expression will be needed which is where I fail badly at the moment. I'm a PHP guy really and brand new to jQuery.

You should try to avoid regular expressions for parsing HTML and instead use an HTML parser. HTML is not a regular language. — Mark Byers, Mar 10 '10 at 21:10
Thanks mark. I've heard that tip before but the HTML is supplied by a separate process and is guaranteed to be in that format. It's not user supplied so does that make a difference? — Jon, Mar 10 '10 at 21:13
@Mark: It is not about parsing HTML, it's about HTML escaping it so that it shows as in "view source", and that IS a job for regexp. — Marko Dumic, Mar 10 '10 at 21:31
@Marko: what about this part: "I don't want the id, title or class to show in the source view." — Josh, Mar 10 '10 at 21:52

Marko Dumic · Accepted Answer · 2010-03-10T22:24:46.837

3

Suppose you have two divs with id's rendered and source:

Rendered:
<div id="rendered" style="border: 1px solid #000">
    <a id="link1">blah blah</a>
    <a id="link2" href="http://www.somesite.com"><strong>Visit this site</strong></a>
</div>
Source:
<div id="source" style="border: 1px solid #000">
</div>

Then this populates the other div with the source of first div, removing some of the attributes (id, title and class):

$('#source').html(
    $('#rendered')
        .clone()
        .find('*')
            .removeAttr('id')
            .removeAttr('title')
            .removeAttr('class')
        .end()
        .html()
        .replace(/\&/g, '&amp;')
        .replace(/\</g, '&lt;')
        .replace(/\>/g, '&gt;')
);

Of course, you need to run this after the DOM is ready.

(I'm not sure about the necessity of &amp escaping.);

edited Mar 10 '10 at 22:24

answered Mar 10 '10 at 21:40

Marko Dumic

9,848
4
29
33

Wow! Thanks Marko and Josh, really appreciate it. So it's basically walking through the whole rendered clone() DOM removing the attributes as it goes. I'm not entirely sure what end() does so going to give that a bit of research just so I understand everything. I think the & does need to be escaped, but I've decided to go with a for source now instead so that's not required. I just need to remove the <a>'s from the word now if it's not a link but not sure of best way. Thanks again!</a> – Jon Mar 11 '10 at 01:00
@Jon: You should accept Marko's answer by clicking the check mark next to it, this will show others that the question has been answered (and give him +15 rep) – Josh Mar 11 '10 at 01:57

score 0 · Answer 2 · edited May 23 '17 at 10:27

See the question/answer: You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML...

But seriously, what you want to do can probably be done much better with JavaScript and DOM manipulation. I know a lot more about Prototype than jQuery, and I could code this in Prototype fairly easily, but I would highly recommend using the HTML parser built-in to the browser. My basic approach would be something like:

Insert the encoded HTML into both the source-view and the rendered-view
Loop through all elements of the source-view:
1. If the element is a link, remove the id, title and class
Encode all HTML entities in the source-view innerHTML; prototype does do this with a regex: String.prototype.escapeHTML = function() { return this.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">"); }
Loop through all child a elements of the rendered view
1. Observe their click event and attach your custom handler, canceling the default handler
2. Perform any other processing to these you might need to. (I'm still a bit foggy on what it is you're trying to accomplish)

I know I'm providing a bit of a vague answer to your question... if I were better with jQuery I could provide some actual code, which I'm sure is what you want. Any jQuery experts want to help Jon out? :-)

Why everyone quotes THAT discussion every time someone mentions HTML and regex in same sentence? This one is **not** about parsing! It is about replacing chars that would generate HTML rendition with their escaped counterparts; perfect for regexp. Am I missing something? — Marko Dumic, Mar 10 '10 at 21:44
You may be missing that he wants to remove HTML attributes, CSS classes, IDs, etc. It seems like he really should be walking the DOM tree. Note I did suggest using a regex to encode HTML entities. — Josh, Mar 10 '10 at 21:48

Showing source of one in another with a regular expression

2 Answers2

Showing source of one
in another with a regular expression