2

I have a var baseURL that I know is: baseURL = c:\whatever\mybasedir\
I have an HTML source code that may contain this:

<IMG alt="foo" src="file://c:\whatever\mybasedir\root\foo\bla.gif">
or/and:
<IMG alt="foo" src="file://c:/whatever/mybasedir/root/foo/bla.gif">
or/and:
<IMG src="c:\whatever\mybasedir\root\foo\bla.gif">
or/and: 
<IMG src="c:\whatever\mybasedir/root/foo/bla.gif">

I need to replace all src tags so that result path is Unix style relative to baseURL:

<IMG src="root/foo/bla.gif">

or if there was an alt attribute (or other. order of attributes may vary):

<IMG alt="foo" src="root/foo/bla.gif">

How do I match <IMG * src="*" *>? Any ideas what RegEx (or other method) can help here?

(I cannot use DOM to do this job, since the IE8/9 DOM is causing this situation in the first place - automatically adding <base href> to all relative src tags)

ZigiZ
  • 2,480
  • 4
  • 25
  • 41
  • 2
    Always worth consulting @bobince at moments like this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454 – David Heffernan Jan 14 '13 at 12:05
  • @DavidHeffernan, That was funny :D How can I use XML parser on HTML? (not XHTML) – ZigiZ Jan 14 '13 at 13:31
  • Well, you need an HTML parser for this job. Of course, regex may do what you need. As an aside, today I happen to be wearing my Stack Overflow T-shirt that contains the text of bobince's famous answer. – David Heffernan Jan 14 '13 at 20:08

2 Answers2

4

You can do

Regex: (<IMG[^>]*)src="[^"]*c:.whatever.mybasedir.

Replace with:$1src="

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • 2
    I'm always surprised what convoluted patterns you RegEx people can construct +1 – Jan Doggen Jan 14 '13 at 11:15
  • I still need to replace `root\foo\bla.gif` to `root/foo/bla.gif` how can I also match the part after `c:.whatever.mybasedir.`? – ZigiZ Jan 14 '13 at 15:40
  • 1
    I have changed your RE to `(]*)src="[^"]*c:.whatever.mybasedir.(.*?")` and used a call-back on `$2` to replace "\" to "/". – ZigiZ Jan 15 '13 at 12:25
3

Replace (<IMG.*src=")(.*[/\\])(root[/\\].*?".*>)

with $1$3

EDIT

Hope this will work

Replace (<IMG.*src=")(.*[/\\]mybasedir[/\\])(root)(([/\\][^/\\]+)*)(".*>)

with $1$3$4$6

Naveed S
  • 5,106
  • 4
  • 34
  • 52
  • 1
    I'm always surprised what convoluted patterns you RegEx people can construct +1 – Jan Doggen Jan 14 '13 at 11:15
  • It almost works! but result is (for the first/and third IMG): `src="root\foo\bla.gif"` (backslash) and I need `root/foo/bla.gif"` (slash) – ZigiZ Jan 14 '13 at 16:06
  • The task fails if the path already contains `root` e.g. `c:\whatever\root\mybasedir\root\foo\bla.gif` – ZigiZ Jan 15 '13 at 12:28
  • So which root should be considered for beginning the path? One following `mybasedir`? – Naveed S Jan 15 '13 at 12:32
  • The "root" is the one following mybasedir. that is correct but the path may already contain "root" (as showed in my comment) - so $3 will match `root\mybasedir\root\foo\bla.gif` in that case. – ZigiZ Jan 15 '13 at 12:36