-3

I'm trying to find a regex that will match HTML that is in a string starting with <div class="table-responsive" with it's closing </div> tag, ignoring the closing divs that don't close the div with the class "table-responsive". I downloaded HTML from another page on a website with Javascript and I'm setting the innerHTML of a div to contain the info I want displayed from the downloaded HTML. The problem is that I only want the tag starting with <div class="table-responsive" and the closing div displayed and I'm trying to cut out everything but I don't know what the best method is to do that. Can anyone make a suggestion how I can do that?

paul pearce
  • 9
  • 1
  • 5
  • 2
    Don't use regex for this. Use [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser). – trincot Sep 14 '22 at 21:50
  • Regular expressions are very bad at matching balanced patterns like HTML. Use an HTML parser, not regexp. – Barmar Sep 14 '22 at 21:50
  • How is this downloaded HTML page stored? Are you wanting to load it with AJAX and then grab the information you need from it? – Andy Sep 14 '22 at 22:04
  • @Andy I'm pulling it from an iframe using this code http://jsfiddle.net/XYjuX/1/ – paul pearce Sep 14 '22 at 22:43
  • @paulpearce: Does the answer address your needs? I so, care to accept the answer? If not edit your question to clarify what you need. – Peter Thoeny Sep 21 '22 at 23:27

1 Answers1

0

It is possible to use regex, first by adding nesting level info to divs, extracting the relevant range based on nesting level, and removing the nesting level info.

It is much easier to convert the HTML to an in-memory DOM, and extract the relevant data. Here is an approach using jQuery:

const html = '<div class="top"> <div class="table-responsive"> <div class="inner1"> <div class="inner2">blah</div> </div> </div> </div>';
let extractedHtml = $(html).find('.table-responsive').prop('outerHTML');
console.log(extractedHtml);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

Output:

<div class="table-responsive"> <div class="inner1"> <div class="inner2">blah</div> </div> </div>

If you only want the HTML inside the div tag use this:

let extractedHtml = $(html).find('.table-responsive').html();
Peter Thoeny
  • 7,379
  • 1
  • 10
  • 20
  • If you don't want to use a parser you can properly parse nested structures with regex using three steps: 1. add nesting level to opening and closing tags, 2. identify matching opening and closing tag based on annotated nesting level, 3. clean up nesting level annotation. More at twiki.org/cgi-bin/view/Blog/BlogEntry201109x3 – Peter Thoeny Sep 21 '22 at 23:28