2

I am officially declaring myself as dumb !!! I'm quite good with regex but the javascript regex is getting on my nerves:

I have following html string:

htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

I need to get all that is inside the UL element based on the text that is inside the aa class div.

I tried the following:

textItem = 'TextOne';

ulRegex = new RegExp('<div class="aa">'+textItem+'</div><ul>(.*)</ul>', "igm");
ul = ulRegex.exec(htmlString);

While writing this question i discovered an error (one tiny extra character) in my regex that didn't let it match but for all those looking for something specific - javascript / regular expression / html string / html substring - its working fine.

Edited

I'm thankful for all the additions to this - but there is one additional aspect i'm using regex - being that i am matching a text item which i am getting through a variable first for the regex pattern.

Solution

Having received a few hints and suggestions i came up with the following which may help someone else as well:

htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

textItem = 'TextOne';

tempdiv = $('<div/>'); 
tempdiv.html(htmlString);
ul = tempdiv.find('div.aa:contains('+textItem+')').next('ul');

$('#res').append(ul);

http://jsfiddle.net/sdXpJ/

The next ul is important because that solves the issue regarding nested ULs and any other regex based solution where i couldn't match a first level UL (having internal one or more Uls).

owsata
  • 1,105
  • 1
  • 11
  • 24
  • 2
    Don't parse HTML with Regex, you might be able to make it work but don't go that route. Use an XML/HTML parser instead. – Ibrahim Najjar Oct 28 '13 at 14:09
  • 2
    I'm confused a little bit. It seems you have a "working" regex. One small detail, use `.*?` instead of `.*` to match ungreedy. There is no need to use the `m` modifier. Also don't forget to add some `\s*` for sake of completeness – HamZa Oct 28 '13 at 14:13
  • 1
    @iambriansreed You mean `$('div.aa').next('ul').html()` – Josh Davenport-Smith Oct 28 '13 at 14:15
  • 1
    Let's listen to the smart guys here http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html and not use regex to parse HTML. Use something smarter, like your browser, which was built to parse HTML - i.e. use [DocumentFragment](http://ejohn.org/blog/dom-documentfragments/). Or hell, [jQuery](http://www.jquery.com/) would do a great job too. – Josh Davenport-Smith Oct 28 '13 at 14:29
  • In short: how do i get the next complete UL element based on a text inside a div? – owsata Oct 28 '13 at 14:38
  • @owsata does that `ul` element always come after the div you mentioned ? Also do you want a regex solution ? – HamZa Oct 28 '13 at 14:55
  • well i guess any solution would be ok - I chose regex because of the textItem i have to search against a specific UL - but the problem is that the UL might also have one or more internal ULs so matching it will become difficult. The other solution which @JoshDavenport suggested is also interesting with the parser that creates xml structure - wher i might find out the exact UL level. But have to analyze this first. What do you think? – owsata Oct 28 '13 at 15:14
  • and yes it comes after the div with the specific class name aa – owsata Oct 28 '13 at 15:17
  • 1
    @owsata if you have nested `ul`'s then just forget about using regex, you'll need recursive patterns which is only available in PHP, Perl, .NET and maybe other few languages I don't know of but they are very few. Check this [awesome answer](http://stackoverflow.com/a/14952740). – HamZa Oct 28 '13 at 15:19
  • @JoshDavenport how would jquery help me in this regard: In short: how do i get the next complete UL (which may have one or more internal ULs) element based on a text inside a div? – – owsata Oct 29 '13 at 07:34

2 Answers2

1

Solution

Having received a few hints and suggestions i came up with the following which may help someone else as well:

htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

textItem = 'TextOne';

tempdiv = $('<div/>'); 
tempdiv.html(htmlString);
ul = tempdiv.find('div.aa:contains('+textItem+')').next('ul');

$('#res').append(ul);

http://jsfiddle.net/sdXpJ/

The "next ul" is important because that solves the issue regarding nested ULs and any other regex based solution where i couldn't match a first level UL (having internal one or more Uls).

owsata
  • 1,105
  • 1
  • 11
  • 24
0

You can use a simple indexOf method for this:

function str_between(str, searchStart, searchEnd, caseSensitive, offset) {
    var fullString = str;

    caseSensitive = caseSensitive || false;
    offset = offset || 0;

    if (!caseSensitive) {
        fullString = fullString.toLowerCase();
        searchStart = searchStart.toLowerCase();
        searchEnd = searchEnd.toLowerCase();
    }

    var startPosition = fullString.indexOf(searchStart, offset);
    if (startPosition > -1) {
        var endPosition = fullString.indexOf(searchEnd, startPosition + 1);
        if (endPosition > -1) {
            return str.substr(startPosition + searchStart.length, endPosition - startPosition - searchEnd.length + 1);
        }
    }
    return false;
}

> var htmlString = '<div class="aa">TextOne</div><ul><li>one</li></ul>';

> str_between(htmlString, '<ul>', '</ul>');
"<li>one</li>"

> str_between(htmlString, '<UL>', '</UL>');
"<li>one</li>"

> str_between(htmlString, '<UL>', '</UL>', true);
false
h2ooooooo
  • 39,111
  • 8
  • 68
  • 102