0

I have more than 2000 <img> tags and I want to replace alt text for each one of them. Alt text are like:

  1. alt="pinblock"
  2. alt="Rich+Austin+shop+4"
  3. alt="hot+dry+sun+az"

I want a quick way to replace all '+' with space (' '), hence I'm using regex to fix this. I've tried this so far:

Find what: alt="(\D+)[+](\D+)[+*](\D*)[+*](\D*)[+*](\D*)[+*](\D*)[+*](\D*)[+*](\D*)"\s

Replace With: alt="\1 \2 \3 \4 \5 \6 \7 \8"

I know I'm doing something wrong, please help.

Complete string would be:

<img border="0" height="111" src="https://1.bp.blogspot.com/-WL5_jMT96p4/U8ILVU9D-mI/AAAAAAAAGeI/rP_RJccbhj8/s1600/hot+dry+sun+az.jpg" alt="hot+dry+sun+az" width="200" />

kikkaz69
  • 13
  • 6
  • Can you please clarify about output? – Akash KC Aug 29 '17 at 01:53
  • What **language** are you trying to do this in? You can't parse HTML with a regex. You can do *exactly* what you're trying to do in JavaScript [**rather easily**](https://stackoverflow.com/a/22822463/2341603) though. – Obsidian Age Aug 29 '17 at 01:55
  • I mentioned platform in Topic, This is for notepad++, (find and replace) – kikkaz69 Aug 29 '17 at 01:56
  • What I'd do would be find `(alt=".*)\+(.*")` and replace with a space, then just repeat the operation until Notepad++ reports no matches. And you also want to do a [non-greedy search](http://docs.notepad-plus-plus.org/index.php/Regular_Expressions) for the closing `"`. – Ken Y-N Aug 29 '17 at 01:57
  • Akash KC, It works fine when there are more than 4 or 5 (+) , and doesn't works on digits. – kikkaz69 Aug 29 '17 at 01:57
  • @KenY-N That won't work. First, you would have to also restore what you have captured and consumed in parenthesis. Second, for a term like `term1+term2+term3`, if you replaced _and_ consumed the first two terms, then you'd miss the second plus (I think). – Tim Biegeleisen Aug 29 '17 at 01:59
  • @Ken Y-N, and replace with? and I think it won't stop on alt attribute, what if there is "width" attribute after "alt" attribute as stated in example. – kikkaz69 Aug 29 '17 at 02:00
  • My opinion is that Notepad++ is not the best place to be doing this replacement. It would be much easier using something like Java or C#. – Tim Biegeleisen Aug 29 '17 at 02:01

1 Answers1

0

Just to demonstrate, I have tested this out and repeated pressing of the Replace All button works. First, press the Regular expression radio button, then:

Find what:    (alt=\"[^+"]*?)\+([^\"]*?")
Replace with: \1 \2

This is of course not fool-proof, but it should work as long as you have no pathological data.

NOTE: My first version had a bug in that it would change alt="hot+dry+sun+az" width="200+200" to alt="hot+dry+sun+az" width="200 200", which is a good example of why one should not use regex to process HTML. I think this task can probably be done in a few lines of JavaScript with zero danger of getting tripped up as I did above, but that's another question for another day!

NOTE 2: My second version also got Zalgoed.

Ken Y-N
  • 14,644
  • 21
  • 71
  • 114