How to delete a similar fragment on several HTML files?

Question

I'm converting a website to a PDF, but there are images in there and along all of them there is a text that when clicked gets you to image itself.

I think this would be the code responsible for showing that text, since I deleted it in one of the files and the text and link is not shown anymore.

<div class="v1"><a target="_self" href="images/graphics/1.jpg">[View full size image]</a></div>

The problem is that there are about 200 more HTML documents containing this similar text, only changing href.

Would there be any easy way to get rid of all this without having to go one by one? Maybe a regular expression for sed?

If you want to parse HTML, use a [HTML Parser](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). — , Oct 23 '12 at 09:44
I'm not using any IDE, I have a website I wanted to turn into PDF; I actually don't know much about web programming. — James Russell, Oct 23 '12 at 10:13

score 1 · Answer 1 · answered Oct 23 '12 at 09:44

1

If the expression is always on one line and the only difference is in href, sed is a possible solution:

sed -e 's,<div class="v1"><a target="_self" href="[^"]*">\[View full size image\]</a></div>,,'

I used an alternative separator , so / does not have to be escaped in closing tags. The brackets in the links's text need to be escaped, though.

answered Oct 23 '12 at 09:44

choroba

231,213
25
204
289

Thank you for the answer, I marked as accepted the other one because it was the one I read and used; but this one is as valid as the other one. – James Russell Oct 23 '12 at 10:17

score 0 · Accepted Answer · answered Oct 23 '12 at 09:39

0

Yes, regular expressions are likely the easiest solution here. If it's simply a question of removing this line from all your files then I'd just open them up in an editor (Sublime Text 2 does this well) and perform a regex search and replace. The following search pattern will likely work:

<div class=\"v1\"><a target=\"_self\" href=\"[^"]+\">\[View full size image\]</a></div>

answered Oct 23 '12 at 09:39

Simon

3,667
1
35
49

Thank you for the regular expression, I changed it a bit to work with `sed` but it worked. – James Russell Oct 23 '12 at 10:16

How to delete a similar fragment on several HTML files?

2 Answers2