3

I'm trying to remove single and double quotes from html attributes that are single words with no white spaces. I wrote this regex which does work:

/((type|title|data-toggle|colspan|scope|role|media|name|rel|id|class|rel)\s*(=)\s*)(\"|\')(\S+)(\"|\')/ims

How ever instead of specifying all the html tags that I want to remove the quotes on, I rather just list the couple attributes to ignore like src and href and remove the quotes on all other attribute names. So I wrote the one below but for the life of me it doesn't work. It some how has to detect any atribute name except the href and src. I tried all kinds of combinations.

/((?!href|src)(\S)+\s*(=)\s*)(\"|\')(\S+)(\"|\')/i

I've tried this but it doesn't work. it just removes the h and s off the attribues for href and src. I know I'm close but missing something. I spent a good 5 hours on this.

working example

$html_code = 'your html code here.';

preg_replace('/((type|title|data-toggle|colspan|scope|role|media|name|rel|id|class|rel)\s*(=)\s*)(\"|\')(\S+)(\"|\')/i', '$1$5', "$html_code");
Andy Lester
  • 91,102
  • 13
  • 100
  • 152

2 Answers2

1

I modified the smaller RegEx you wrote, resulting in this:

((\S)+\s*(?<!href)(?<!src)(=)\s*)(\"|\')(\S+)(\"|\')

When your version is parsed, the lookahead will arrive at some 'h' preceding a 'href' in your document and fail, then proceed to the next character. Since 'ref' doesn't match 'href' or 'src', the rest of your pattern will match.

With my modifications, any 'href' or 'src' will be initially accepted by the regex. When the lookbehind is reached, it will check for 'href' in the already parsed text and will fail if it is found.

mroemore
  • 131
  • 3
0

Also, it would be preferable instead of filtering for href or src attribute, to filter out for = instead. Here would be a good Regex to do this (this Regex also presume that all attributes use double quotes):

// Remove all double quote with attribute that have no space and no `=` character.
$html = preg_replace('/((\S)+\s*(=)\s*)(\")(\S+(?<!=.))(\")/', '$1$5', $html);
Nicolas Bouvrette
  • 4,295
  • 1
  • 39
  • 53