I'm trying to figure out how to use look-ahead to try to capture the descriptive text in an html page such as
<div class="itemBanner" style="float:left; padding:10px">
<div style="padding-right:5px; padding-bottom:5px">
<div class="itemBanner">
HTML Tags Stripper is designed to strip HTML tags from the text. It will also strip embedded JavaScript code, style information (style sheets), as well as code inside php/asp tags (<?php ?> <%php ?> <% %>). It will also replace sequence of new line characters (multiple) with only one. <b>Allow tags</b> feature is session sticky, i.e. it will remember allowed tags list, so you will have to type them only once.<p></p>You can either provide text in text area below, or enter URL of the web page. If URL provided then HTML Tags Stripper will visit web-page for its contents.<p></p>
<b>Known issues:</b><br />
I figured a regex that looks for a '>' followed by at least 150 characters before a '<' would do the trick.
The closest I've gotten so far is:
(([^.<]){1,500})<
Which still misses on things like periods and other characters before and after the string.