1

I hope this question's use case is not too specific for asking this question on this site. I have a very straightforward problem, but I think it could help learn about more general approaches as well in working with search and replace methods through command line tools for others who may come across this question.

My problem:

I have a directory structure with a few thousand html files, and what I would like to do is this:

Whenever there is a tag with an id attribute set I would like to add a class="anchor" to it, or if there is already one or more classes, add the class "anchor" in addition.

So I would like to replace any

<someTag id="some-id">

with

<someTag id="some-id" class="anchor">

and any

<someTag id="some-id" class="some-class">

with

<someTag id="some-id" class="some-class anchor">

Of course there could be all kinds of attributes mixed in between, so I would need some kind of search and replace method that could recognize these things between pointed brackets correctly.

I am using Ubuntu, and so I have all kinds of command line tools like sed at my fingertips, but I am not very experienced in their usage. So it would be of much help to me if anyone with more experience in this would know a quick solution.

Thank you very much for reading and thinking about it and it would be great if you have any suggestions.

1 Answers1

0

HTML is famously difficult to parse with a regexp so I don't know.

Maybe that difficult is exaggerated?

Or triggered by special cases which aren't present in 'regular' HTML?

I'm not sure what all the problems are but the not-joke answers to that question might explain it.

Does the site use javascript, do the web browsers which use the site have javascript enabled? Because I'd guess it's easy (or easier) to make these edits (adding a class to elements in the page) at run-time in the user's browser using javascript (because the browser will have already parsed the HTML and constructed a DOM).

Alternatively I'm not sure why you're adding classes to elements with ids; if the reason is that the class names are referenced in CSS, an alternative might be to add those IDs (separated from the class names with a comma) to the CSS.

Community
  • 1
  • 1
ChrisW
  • 54,973
  • 13
  • 116
  • 224