You have to use a regex that knows how to match tags.
Procedure:
Make 2 replace all passes on the source. You'll need a callback to replace the spaces with underscores.
The first, ID will be explained, the NAME is the second pass (procedure is the same).
<a(?=\s)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\sid\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)*?>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
is the replace dall regex for the ID
Explained
# Begin Anchor tag
< a
(?= \s )
(?= # Asserttion (a pseudo atomic group)
( # (1 start), Up to ID attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s id \s* = \s*
) # (1 end)
(?:
( ['"] ) # (2), Quote
( [\S\s]*? ) # (3), ID Value
\2
)
( # (4 start), After ID attribute
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )*?
>
) # (4 end)
)
# Have the ID, just match the rest of tag
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
> # End Anchor tag
Inside the callback, the groups will be joined together to form the replacement
like so.
// store the captured groups
$g1 = match.groups[1];
$g2 = match.groups[2];
$g3 = match.groups[3];
$g4 = match.groups[4];
// construct the return string from the stored capture groups
return "<a" + $g1$g2 +
replaceAll($g3, " ", "_") +
// here is a regex global replace function
$g2$g4;
Legend:
group 1 = Up to ID attribute
group 2 = Value Delimiter
group 3 = ID Value
group 4 = After ID attribute
The Name attribute is the same for the callback, use this regex for the replace all.
<a(?=\s)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)*?>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>