0

I'm playing around with some contenteditable div's and after a while I'm getting the following code:

<p>
some text
<ul>
  <li>A</li>
  <li>2</li>
</ul>
another text
</p>

Now I'd like to get valid HTML. So I have to extract the ul from the p or in other words close the p tag before the ul and open it again after the ul. Afterwards my code should like like this:

<p>some text</p>
<ul>
  <li>A</li>
  <li>2</li>
</ul>
<p>another text</p>

How can I achieve this (with jQuery)?

I already tried this (where elem is the jQuery element containing p):

var reg1 = new RegExp('<ul>', 'g');
var reg2 = new RegExp('</ul>', 'g')
elem.html(elem.html().replace(reg1, '</p><ul>').replace(reg2, '</ul><p>'));

This seems to work. But I'm not sure whether this is the best solution. Do you have a better idea?

Scott
  • 1,863
  • 2
  • 24
  • 43
vso
  • 199
  • 2
  • 17
  • 5
    Woooh, handling HTML with regex. People will like that here. – John Weisz Aug 07 '15 at 14:20
  • If you use jQuery to do more of it then there's a good chance that some browser might do the fix for you. Best option would likely be to convert it to xml and work with it as that. – Kevin B Aug 07 '15 at 14:21
  • 2
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – j08691 Aug 07 '15 at 14:26
  • @KevinB I don't think xml is even close to best option. Why would it be?? Would you be willing to post an answer to explain? – Cayce K Aug 07 '15 at 14:37
  • @CayceK No, but, the idea is if it's invalid html, it's certainly plausible that some browser might try to correct it for you, thus resulting in unexpected results. If you parsed it as xml instead that wouldn't happen. note that as xml the jquery methods you used would still work. – Kevin B Aug 07 '15 at 14:50
  • I guess since in this case the source is an html element, it's a moot point. If the browser would have made such a correction, it would have already done so. – Kevin B Aug 07 '15 at 14:53
  • That is interesting! I didn't honestly know that so it would be interesting to see how that would work one day. – Cayce K Aug 07 '15 at 14:53
  • (Duplicate comment of @KevinB) If you're extracting the "invalid HTML" using `elem.html()`, it has already been parsed into the DOM. To enter the DOM, the browser makes its best guess at what you mean and fixes up the HTML. Take a look at the DOM (ctrl + shift + I, or just `console.log` the `elem.html()`). Whatever browser you're using should have automatically closed `

    ` tags where it can.

    – Shadowen Aug 07 '15 at 14:56

3 Answers3

2

You're looking for before / after.

DO NOT USE REGEX FOR THIS.

Regex is for specialized cases where you need to target a specific set of text for a purpose that most of the time can be avoided in jQuery / Javascript.

If you just do

var ul = $('p').find('ul');
ul.closest('p').before(ul);

$(document).on('ready', function(){
  
    var ul = $('div').find('ul');

    ul.closest('div').addClass('red').before(ul);
});
.red{
  color: red;
  }
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<div>
  <span>some text</span>
  <ul>
    <li>List</li>
  </ul>
  <span>and more text</span>
</div>

It should be noted that it is illegal to use a ul or a ol tag inside of a p tag. Because of this I changed the p to a div to demonstrate. Most browsers should not mess up if you use a p instead of a div with this code.

see Should ol/ul be inside <p> or outside?

Community
  • 1
  • 1
Cayce K
  • 2,288
  • 1
  • 22
  • 35
  • I know that it is illegal to use ul inside a p tag. This is the reason why I'm trying to extract it. Your solution doesn't do exactly what I need. I don't want to have the ul BEFORE the text but in place where it stands. This is why I need to close the p tag before the ul and to open it again after the ul. – vso Aug 10 '15 at 07:14
  • Do you not control the html? – Cayce K Aug 10 '15 at 09:26
2

What's happening

The invalid HTML is parsed into a valid DOM by the browser, possibly adding closing tags where necessary. In Chrome I see:

<p>
  before
  <ul>
    <li>item></li>
  </ul>
  after
</p>

appear in the DOM inspector as

<p>
  before
</p>
<ul>
  <li>item></li>
</ul>
after
<p></p>

The solution

So, assuming you have the unparsed HTML somewhere in a string, you can use the following (modified) regexes for a tighter match. It might be overkill though.

var beginRegex = new RegExp(/<p>(((?!<\/p>).|[\r\n])*?)<ul>/gi);
var endRegex = new RegExp(/<\/ul>(((?!<p>).|[\r\n])*?)<\/p>/gi);

var content = ...;

$('#target')
    .html(content
        .replace(beginRegex, '<p>$1</p><ul>')
        .replace(endRegex, '</ul><p>$1</p>')
    );

JSFiddle

It matches only unclosed <p> tags before <ul> and likewise for afterwards. One thing you can take away without using the complicated regex is to use the i modifier for case insensitive, since HTML is case insensitive.

Shadowen
  • 838
  • 5
  • 14
  • This one seems to do what I need. I'll check it in my case. Thanks! – vso Aug 10 '15 at 07:25
  • For anyone looking in the future, I used [this regex tester](https://regex101.com/#javascript) to write the regex. – Shadowen Aug 10 '15 at 13:07
  • It's me again. I've modified your solution (regexes) a little bit so that it should work mit
      's also. But there is still a case where this doesn't work as expected.Please check my [JSFiddle](http://jsfiddle.net/2dLLp3sa/1/). Maybe you've got an idea how to improve the regexes, so that they work in my example. The problem is that after the regex replacement there is one word placed outside the

      tag.

    – vso Oct 27 '15 at 15:47
  • Sorry, I can't tell how you've modified the regex to accept `
      ` tags. However, this IS getting a bit complicated for regex parsing anyways. Try looking into some of the alternative solutions if you can.
    – Shadowen Nov 01 '15 at 04:02
0

Thank to Shadowen I finally took his suggestion and modified it a little bit to get what I need. What I forgot to ask in my origin question - this should work for and also for lists. So this is the solution that works for me:

var content = '...';
var beginRegexUl = new RegExp(/<p>(((?!<\/p>).|[\r\n])*?)<ul>/gi);
var endRegexUl = new RegExp(/<\/ul>(((?!<p>).|[\r\n])*?)<\/p>/gi);
var beginRegexOl = new RegExp(/<p>(((?!<\/p>).|[\r\n])*?)<ol>/gi);
var endRegexOl = new RegExp(/<\/ol>(((?!<p>).|[\r\n])*?)<\/p>/gi);

$('#target')
  .html(content
    .replace(beginRegexUl, '<p>$1</p><ul>')
    .replace(endRegexUl, '</ul><p>$1</p>')
    .replace(beginRegexOl, '<p>$1</p><ol>')
    .replace(endRegexOl, '</ol><p>$1</p>'));

JSFiddle

vso
  • 199
  • 2
  • 17