1

I currently have thousands of .html & .htm files which all have a consistent banner at the top of the page. Some of the attributes may be in different locations within the tag, but they all begin with img src=. I want to find the banner - enclosed within an img tag - in all of these files, locate the closing '>' and then append another image directly after it.

So I want to find
<img src="images/banner.jpg".*>$ and append immediately after it
<img src="images/new-banner.jpg> so that it would look like

<img src="images/banner.jpg width="x" height="x"> <img src="images/new-banner.jpg">

I know, "regex can't be used to parse HTML" as stated here. But really, I think it should say that regex shouldn't be used to parse HTML, because can't is a powerful word and isn't truthful in this instance.

If you have a recommendation of how I can achieve the same result without regex, I'm happy to try other alternatives. I am after the result and I am not closed off to suggested methods of getting there.

What I current have is this
Select-String -Pattern '<img src="images/banner.jpg".*>$' *.htm -AllMatches | % { $_.Matches } | % { $_.Value }
This gets me halfway there. It returns the found match in its entirety, however, I am not sure how to proceed such that I can append my desired string immediately following the closing >

Thank you all for your time and thoughts :).

Austin
  • 114
  • 7
  • This is not parsing HTML with regex. It would be the same question if you wanted to add "additionaltext" to every instance of "test". The tags are irrelevant. – Jacob Colvin Jun 14 '18 at 20:25
  • @JacobColvin What do you mean that the tags are irrelevant? I need to find a specific tag and jump to the end of it and then append text after that tag. – Austin Jun 14 '18 at 20:29
  • Yeah but I'm saying that when people typically say "regex is insufficiently sophisticated to understand the constructs employed by HTML" they are referring to the structure of HTML. Find and replace is what regex is made for, and that's what you want to do. The fact that you need to insert after a certain character is irrelevant. I'm saying that you're on the right track and definitely do not need to use an HTML parser for something like this. – Jacob Colvin Jun 14 '18 at 20:42

1 Answers1

1

Here's one way to do it...

  1. Open your IDE of choice that supports regex Find & Replace across multiple files. I'll use the free Visual Studio Code for the following steps:

  2. File > Open... and select the root folder containing all of your files (if they're buried in subfolders, that's fine).

  3. Edit > Replace in Files and click the Use Regular Expression toggle (the icon with .* on it).

  4. Insert this in the Find box: src="images/banner\.jpg"(.*?)>

  5. Insert this in the Replace box: src="images/banner.jpg"$1>\n<img src="images/new-banner.jpg">

  6. Preview the change and click Replace All if you're ready!

Hope this helps.

Jon Uleis
  • 17,693
  • 2
  • 33
  • 42
  • 1
    Thanks Jon, unfortunately that is not turning any matched results. I'm not sure if the formatting is off. I am trying some different variations now. Edit: I did not have Use Regular Expressions on. It is returning some results now, let me play around with it. Thanks – Austin Jun 14 '18 at 20:54