If you are mixing non-html with html, it's best to use regex.
Here is a way to do the substitutions.
Links:
(?i)(<a)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*(['"])/mycms/~/link\.aspx\?_id=)([a-f0-9]{32})(&_z=z\3(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
Replace with $1$2
+ key{$4}
+ $5
where key{$4}
is the new link ID value from the dictionary.
https://regex101.com/r/xRf1xN/1
# https://regex101.com/r/ieEBj8/1
(?i) # Case insensitive modifier
( < a ) # (1), The a tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the ID num
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s href \s* = \s* # href attribute
( ['"] ) # (3), Quote
/mycms/~/link\.aspx\?_id= # Prefix link static text
) # (2 end)
( [a-f0-9]{32} ) # (4), hex link ID
( # (5 start), All past the ID num
&_z=z # Postfix link static text
\3 # End quote
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (5 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
Media:
(?i)(<img)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\ssrc\s*=\s*(['"])/mycms/~/media/)([a-f0-9]{32})(\.ashx\3(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
Replace with $1$2
+ key{$4}
+ $5
where key{$4}
is the new media ID value from the dictionary.
https://regex101.com/r/pwyjoK/1
# https://regex101.com/r/ieEBj8/1
(?i) # Case insensitive modifier
( < img ) # (1), The img tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the ID num
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s src \s* = \s* # src attribute
( ['"] ) # (3), Quote
/mycms/~/media/ # Prefix media static text
) # (2 end)
( [a-f0-9]{32} ) # (4), hex media ID
( # (5 start), All past the ID num
\.ashx # Postfix media static text
\3 # End quote
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (5 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
If i wanted to a) extract the ID within the link/src tag and b) replace the entire href=".." or src=".." value (and not hust the ID part, how would that look in RegEx?
To do this, just rearranges the capture groups.
Links:
(?i)(<a)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s)(href\s*=\s*(['"])/mycms/~/link\.aspx\?_id=([a-f0-9]{32})&_z=z\4)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
Replace with $1$2href='NEWID:key{$5}'$6
where key{$5}
is the new link ID value from the dictionary.
https://regex101.com/r/FxpJVl/1
(?i) # Case insensitive modifier
( < a ) # (1), The a tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the href attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
) # (2 end)
( # (3 start), href attribute
href \s* = \s*
( ['"] ) # (4), Quote
/mycms/~/link\.aspx\?_id= # Prefix link static text
( [a-f0-9]{32} ) # (5), hex link ID
&_z=z # Postfix link static text
\4 # End quote
) # (3 end)
( # (6 start), remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (6 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
Media:
(?i)(<img)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s)(src\s*=\s*(['"])/mycms/~/media/([a-f0-9]{32})\.ashx\4)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
Replace with $1$2src='NEWID:key{$5}'$6
where key{$5}
is the new media ID value from the dictionary.
https://regex101.com/r/EqKYjM/1
(?i) # Case insensitive modifier
( < img ) # (1), The img tag
(?= # Asserttion (a pseudo atomic group)
( # (2 start), Up to the src attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
) # (2 end)
( # (3 start), src attribute
src \s* = \s*
( ['"] ) # (4), Quote
/mycms/~/media/ # Prefix media static text
( [a-f0-9]{32} ) # (5), hex media ID
\.ashx # Postfix media static text
\4 # End quote
) # (3 end)
( # (6 start), remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (6 end)
)
# All the parts have already been found via assertion
# Just match a normal tag closure to advance the position
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>