0

I'm trying to do a search and replace across a lot of files and I need to format the following HTML.

<a href="http://www.XXXXXXXXX.com target=_blank">
<img alt="XXXXXXXXX" src=http://domain.org/files/image.gif" />
</a>

I need regex for the XXXXXXX parts. Basically find all combinations of the domains used and all combinations of the alt words used.

Some domains have 1 - (dash) and others have 2 - (dashes) in them while the rest do not. Some alt images are 2 words while others are 3. There are no numbers in the domain or alt tags.

Any help would be greatly appreciated.

amiregelz
  • 1,833
  • 7
  • 25
  • 46
DRK
  • 91
  • 1
  • 10
  • 2
    Are you doing with pencil&paper on in a specific programming language? In either case, regex + html are not close friends. – moonwave99 Aug 31 '12 at 16:20
  • 1
    The obligatory reference to a [previous question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Corey Ogburn Aug 31 '12 at 16:23
  • I'm going to use it in notepad++. Will it give me issues? – DRK Aug 31 '12 at 16:23
  • And you need to replace it with what exactly? I mean, do you need to capture these 'XXXXX' or not? – raina77ow Aug 31 '12 at 16:37
  • Are the missing quotes typo? or is it the real string you have to change? – Toto Aug 31 '12 at 16:58
  • 1
    Hmm - I'm not sure what you plan to replace... In any case you should use an xml processor such as sax or xslt, not regex. – Alex Brown Aug 31 '12 at 22:55

1 Answers1

2

Replace:

a href="http://www\..+\.com\ +target

with

a href="http://www.NEWVALUE.com target

Replace:

img alt="[^"]+"\ +src=

with

img alt="NEWVALUE" src=
vsh
  • 160
  • 4