-1

I am trying to come up with a regex to find spaces that exist within anchor id and name values.

For instance, in the tag

<a id="Subsection Two Test One Two Three" name="Subsection Two Test One Two Three">

the regex would find the spaces between the quotation marks, but ignore the space between a and id and between " and name, and ignore anything outside of the tag.

The goal is to use the regex in Sublime Text to find the spaces in the attribute values and replace them with underscores.

2 Answers2

0

You have to use a regex that knows how to match tags.

Procedure:

Make 2 replace all passes on the source. You'll need a callback to replace the spaces with underscores.


The first, ID will be explained, the NAME is the second pass (procedure is the same).

<a(?=\s)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\sid\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)*?>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

is the replace dall regex for the ID

Explained

 # Begin Anchor tag

 < a                 
 (?= \s )
 (?=                           # Asserttion (a pseudo atomic group)
      (                             # (1 start), Up to ID attribute
           (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
           \s id \s* = \s* 
      )                             # (1 end)
      (?:
           ( ['"] )                      # (2), Quote
           ( [\S\s]*? )                  # (3), ID Value
           \2 
      )
      (                             # (4 start), After ID attribute
           (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )*?
           >
      )                             # (4 end)
 )

 # Have the ID, just match the rest of tag

 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+

 >                             # End Anchor tag

Inside the callback, the groups will be joined together to form the replacement
like so.

// store the captured groups
$g1 = match.groups[1];
$g2 = match.groups[2];
$g3 = match.groups[3];
$g4 = match.groups[4];

// construct the return string from the stored capture groups

return "<a" + $g1$g2 +
replaceAll($g3, " ", "_") + // here is a regex global replace function
$g2$g4;

Legend:
group 1 = Up to ID attribute
group 2 = Value Delimiter
group 3 = ID Value
group 4 = After ID attribute


The Name attribute is the same for the callback, use this regex for the replace all.

<a(?=\s)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)*?>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

0

You can use the following regex to replace Spaces with an empty string (your regex engine must support look behind and look ahead):

/(?<!\<a)(?<=\w)\s(?=\w)/g

The regex starts by making a negative look behind for: '<a'.

Then it makes a positive look behind for a Word character, then matches a White space and finally looks ahead for a Word character.

Now replace the matches with an empty string.

Poul Bak
  • 10,450
  • 5
  • 32
  • 57