1

I've seen over and over, and over and over and over on Stack Overflow that Regular Expression are NOT a good fit for XHTML. What I haven't seen however is an alternative.

Most text editors have a built in RegEx search and replace that is just super easy to use. Well, except for the fact that it doesn't work well with HTML. Is there some tool or language that is meant for parsing and replacing XHTML? It would be great if you could say "find all paragraph tags that have the class of "quote" that are within the DIV with the class of "monkey", and then add a H2 tag with "Monkey Quote" inside.

Another example that I'm struggling with finding a solution to is to find all words within Paragraph tags and wrap a SPAN tag around them (for word-by-word highlighting audio). That kind of stuff.

Is there a tool or language that is meant for this kind of thing?

Arktype
  • 95
  • 1
  • 6
  • In your "wrapping a `` around each word" example, you'd be better off doing that dynamically in JavaScript. Keep your markup clean. – Ry- Mar 10 '12 at 02:27
  • No one has ever suggested JavaScript? – Ray Toal Mar 10 '12 at 02:28
  • 3
    Yes, it's called a HTML/XML parser. There are many of those out there, depending on the language/platform you are using. Search SO for `LanguageX HTML parser`, eg http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php – Qtax Mar 10 '12 at 02:30
  • @minitech That is a great idea. Only problem is the ``'s get a unique ID that corresponds to a label that is not always systematic. So it ends up being ``, the next _might_ be `` or _might_ be ``. But I'll look into making it systematic for that reason. – Arktype Mar 10 '12 at 02:35
  • @Qtax Thank you. There doesn't seem to be any replace functionality built in to those parsers, but I guess that is what you would use the language for. I was hoping for something pretty simple that specifically was meant for find/replace operations locally. JavaScript makes sense for my example, but most of the time I just want to tweak 50 or so XHTML files that are similar in structure. – Arktype Mar 10 '12 at 02:42

2 Answers2

3

From your last comment, I'm assuming you'd like something useful from the command-line.

If so, answered pretty well here:

Grep and Sed Equivalent for XML Command Line Processing

Community
  • 1
  • 1
sethcall
  • 2,837
  • 1
  • 19
  • 22
3

If you have a well formed document, XSLT and XPATH can do what you need.

Devon_C_Miller
  • 16,248
  • 3
  • 45
  • 71