0

I have a string, in javascript, containing a few img tags.

I need to find those who don't have a class property and add a class named "myClass" (class="myClass") img tags that already have a class - I should not touch.

note that this is not DOM so I can't use things like "element.classList.contains(class)" which would be very helpful for that.

I can only use regex. and again, the string can contain more than 1 img tag.

example string:

<img src="https://google.com/animage.jpg" class="google">
<img src="https://yahoo.com/animage.jpg">

this is what I use to find img tags in the string that DO have a class property:

<\s*\/?\s*img[^>]* class=[^>]*>

what regex should I use to find those who don't and add a class only to those?

would be best if I could just use thsoe img tags as if they are part of the DOM but I can't I also can't use jQuery by the way

edit: I should mention that the string contains not only img tags but other tags and other HTML content as well.

john_black
  • 167
  • 3
  • 14
  • 1
    Is this a *known, limited* set of HTML? If it's not, [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – ctwheels Feb 14 '18 at 16:46

4 Answers4

2

You really can (and should) avoid using regex for this task.

  1. create a temporary element
  2. append your string to the element as innerHTML
  3. traverse the temporary element DOM to find the desired img (or several of them)
  4. update their className
  5. get the updated innerHTML back to the string
  6. remove the temporary element

var html = `Some plain text, 
<a href="whatever">
  <img src="https://google.com/animage.jpg" class="google">
</a>. 
Some more text 

<h2>Header</h2>
<figure>
  <img src="https://yahoo.com/animage.jpg">
  <figcaption>Image of something</figcaption>
</figure>

more images:
<img src="https://google.com/animage.jpg" class="google">
<img src="https://yahoo.com/animage.jpg">
<img src="https://google.com/animage.jpg" class="google">`;

var tmp = document.createElement('div');
tmp.innerHTML = html;

tmp.querySelectorAll('img:not([class])').forEach(function(e) {
  e.className = 'myClass';
});

html = tmp.innerHTML;
tmp = null;

console.log(html);
Kosh
  • 16,966
  • 2
  • 19
  • 34
  • great reply. creating a temporary element instead of using regex is a good solution and the one I choose. now my only problem is that the string can contain other HTML elements and other non tag HTML content like plain text. those img tags are not grouped/concentrated in one portion of the content. – john_black Feb 15 '18 at 12:26
  • 1
    @john_black, I've updated my answer to show you that it's not a problem. – Kosh Feb 15 '18 at 16:05
  • @john_black, yes, you can use any **valid** css selector. `img:([data-mytest])` is not correct. You probably mean `img[data-mytest]` or `img:not([data-mytest])`. – Kosh Feb 18 '18 at 07:01
0

Please don't use regex to parse HTML

You can get all element with img tag and then for each image check if it has attribute class, in case the element doesn't have any class add myClass to it.

var images = document.querySelectorAll('img');
images.forEach(img => {
  if(!img.hasAttribute('class')){
    img.classList.add('myClass');
  }
});
<img src="https://google.com/animage.jpg" class="google">
<img src="https://yahoo.com/animage.jpg">
Hassan Imam
  • 21,956
  • 5
  • 41
  • 51
0

Disclaimer:

I can't stress this enough: Do not parse HTML with regex. As already linked elsewhere here, it's a very bad idea!


Very Bad, Regex-Based Solution:

That being said, assuming you have very limited variability in your HTML, you might be able to use this:

/<\s*\/?\s*img(?![^>]*class=)/g


Take Note:

I removed all newlines in the demo text. It will work with new lines too, but it's important to test it this way because one of the major problems with parsing HTML with regex is that regex works parsing forwards. If you have to look backwards (as you often have to do with tags that are not self-closing) regex as an HTML parser breaks down very quickly. (Note that regex does have "look behind" in some parsers, but JavaScript is not one of them.) Fortunately for you, <img> tag is self closing and can only be self-closing and your requirements are pretty minimal.

Community
  • 1
  • 1
Joseph Marikle
  • 76,418
  • 17
  • 112
  • 129
0

Please consider another solution than regex.

But also note that your problem is totally solvable with regex. What regex can't do is parsing HTML, because parsing HTML requires balancing and balancing requires stack-based automata.

Since your problem does not require you to balance tags or nesting it is within the scope of regular expressions. But it still won't be pretty:

This expression is pretty safe (and ugly):

/<img (?:.*?class=['"](?!google[ '"])(?![^'"]* google['"]).+?['"]|(?!class=['"]))\s*[\/]?>/g

It will correctly match the following tags depending on whether or not they have the google class.

<img src="https://google.com/animage.jpg" class="google">
<img src="https://yahoo.com/animage.jpg">
<img src="https://google.com/animage.jpg" class="google">
<img src="https://yahoo.com/animage.jpg" class="azerty">
<img src="https://yahoo.com/animage.jpg" class="not-a-google">
<img src="https://yahoo.com/animage.jpg" class="azerty still-not-a-google">
<img src="https://yahoo.com/animage.jpg" class="google-i-am-not">
<img src="https://yahoo.com/animage.jpg" class="azerty still-not-a-google" rel="google">
<img src="https://yahoo.com/animage.jpg" class="azerty still-not-a-google" rel="google"/>
<img src="https://yahoo.com/animage.jpg" class="azerty still-not-a-google" rel="google" />
<img src="https://yahoo.com/animage.jpg" class="" rel="google" >
<img src="https://yahoo.com/animage.jpg" class="totally-a google" rel="google" />
<img src="https://yahoo.com/animage.jpg" class='i-love-myhtml-with-single-quotes google' rel="google" />
<img src="https://yahoo.com/animage.jpg" class='i-love-myhtml-with-single-quotes not-a-google' rel="google" />

https://regex101.com/r/TW8js8/1

Nicolas Reynis
  • 783
  • 3
  • 16