2

I have inherited a website, in which I am having to update about 3500 files with very 95% similar content in each (product pages).

In order to make some changes, I am using Regex (in Dreamweaver) to do some bulk editing.

I've been able to get everything done ok, but I am running into a problem with content within a tag.

I need to be able to grab all the content within that tag and save it for when I replace the other content on the page (this is one of the few things whose content is different from page to page).

Here is an example:

<ul>
<li style="padding-top:10px; text-align:right;"><a href="http://www.website.com/additem.wws?Sku=ABC123&sup=AAA&mfr=BBB&price=99.99&core=10.00&qty=1&description=ITEM">Single Item - $99.99 <img src="../../images/buy-now-button.gif" alt="Buy Now" width="50" height="20" border="0">&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
<li style="padding-top:10px; text-align:right;"><a href="http://www.website.com/additem.wws?Sku=ABC123-6&sup=AAA&mfr=BBB&price=299.99&core=60.00&qty=1&description=INJECTOR"><strong>Set of 6 Items - $299.99</strong> <img src="../../images/buy-now-button.gif" alt="Buy Now" width="50" height="20" border="0">&nbsp;&nbsp;&nbsp;&nbsp;</a></li>
<li style="padding-top:10px"><img src="../../images/free_shipping.jpg" alt="Free Upgrade." width="227" height="107">  </li>
</ul>

I would go more individually and get the content in the individual <li> tabs, but the problem is that some pages have only one <li> within the <ul>, or up to 6 depending on the number of product variations on that page.

So my overall question is this: how do I grab all the content (including new lines, other tags, etc.) within a given tag and save it for when the rest of the content needs to be replaced? I know how to use parentheses around the content and then $# in the Replace section.

The websites I've worked on thus far have been much smaller, and I've not had much need for Regex because it was typically easier to make changes manually or just using literal text in Find/Replace.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
SC-HELP
  • 21
  • 1
  • 2
  • 4
    Read [the first answer to this question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) as to why you shouldn't parse HTML with regex. – Charles Sprayberry Aug 07 '11 at 20:24
  • 3
    Hilarious. Helps a lot. An actual suggestion as to what I do might've helped as well. Thanks anyway though. – SC-HELP Aug 07 '11 at 20:54
  • Use a HTML parser compatible with Dreamweaver. From some very cursory googling I discovered that Dreamweaver even comes with its own HTML parser. Use that. Don't use regex to parse HTML. – Charles Sprayberry Aug 07 '11 at 20:57
  • And an HTML parser is capable of doing Find/Replace functions as well? – SC-HELP Aug 07 '11 at 21:07
  • Dreamweaver can do search and replace on content within specific tags or a combination of specific tags with specific attributes. Depending on what exactly you are trying to do, it may be a bit easier than you think and the regex will be minimal – JCL1178 Aug 08 '11 at 00:03

1 Answers1

7

How complex are these web pages? If <ul> elements are never nested inside other <ul> elements, and you don't have to deal with bogus tags inside (for example) SGML comments or CDATA sections, this is probably all you need:

<ul>[\s\S]*?</ul>

[\s\S] is how you match any character including newlines in JavaScript regexes (which is what Dreamweaver uses, or so I've read).

*? tells it to match zero or more, reluctantly--meaning it quits matching as soon as it becomes possible for the next part of the regex (</ul>) to match.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • This seems to work perfectly, thanks! Incidentally, I realize that this might not be the best practice in general, but this religious freakout that so many people go into over this seems silly. Dreamweaver doesn't have anything else that is seamlessly fitted into the Find/Replace function. – SC-HELP Aug 08 '11 at 07:05
  • Oh, don't mind them! They're like the Scientific People from [The Stars My Destination](http://books.google.com/books?id=RUz9ewEACAAJ), except instead of "Quant Suff!", they like to parrot "HTML's Not Regular!" all the time. They're mostly harmless--just don't let them near your face with a needle and ink. ;) – Alan Moore Aug 08 '11 at 21:31