0

I have a big xml structure. I am interested in certain xml structure like below. I need to extract img tags only and the value of the src attribute if they are inside coral-card. I was trying to use regex to get coral-card enclosing tags with a regex and then use regex with the coral-card tags to get to the img tag and the content.

var regex = /<coral\-card ((.|[\r\n])*?)<\/coral\-card>/g;

Is there a way to use anything after I have got the specified xml content containing coral-card tags like below. I don't want to use regex after this as I think it should be possible to get the img tag and src attribute value using jquery or javascript function.

<coral-card variant="condensed" data-timeline="true" stacked>
    <coral-card-asset>
        <img src="/content/dam/collections/3/3qtVFsGwnDVKpZ6H_SaM/lightbox.folderthumbnail.jpg?width=240&height=240">
    </coral-card-asset>
 </coral-card>

<coral-card variant="semi-condensed" data-timeline="true" stacked>
    <coral-card-asset>
        <img src="/content/dam/collections/3/3qtVFsGwnDVKpZ6H_SaM/small.folderthumbnail.jpg?width=240&height=240">
    </coral-card-asset>
 </coral-card>
Geek
  • 25
  • 6
  • Have you considered parsing with [DOMParser](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser)? – ray Nov 18 '16 at 17:39
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – nicovank Nov 18 '16 at 17:41

2 Answers2

2

DOMParser and xpath are very easy to use for parsing xml. You can do something like:

const DOMParser = require('xmldom').DOMParser;
const xpath = require('xpath');

let parser = new DOMParser();
let doc = parser.parseFromString(<your xml>);
let document = doc.documentElement;
let coralCards = xpath.select('<path>/coral-card', document);

See xpath docs for all of the ways you can extract nodes out of an xml blob.

Jim B.
  • 4,512
  • 3
  • 25
  • 53
0

This is exactly why the core DOM specification was created:

// Find all the <coral-card> elements:
var elements = document.getElementsByTagName("coral-card");

// Loop through them:
for(var i = 0; i < elements.length; ++i){
  // Extract whatever you need:
  console.log(elements[i].getAttribute("variant"));
  console.log(elements[i].querySelector("img").src);
}
<coral-card variant="condensed" data-timeline="true" stacked>
    <coral-card-asset>
        <img src="/content/dam/collections/3/3qtVFsGwnDVKpZ6H_SaM/lightbox.folderthumbnail.jpg?width=240&height=240">
    </coral-card-asset>
 </coral-card>

<coral-card variant="semi-condensed" data-timeline="true" stacked>
    <coral-card-asset>
        <img src="/content/dam/collections/3/3qtVFsGwnDVKpZ6H_SaM/small.folderthumbnail.jpg?width=240&height=240">
    </coral-card-asset>
 </coral-card>
Scott Marcus
  • 64,069
  • 6
  • 49
  • 71
  • Thanks. I have that specified xml content within coral-card tag or to put it precisely I have a very big HTML content lets say htmlResponse which has this xml content . How would documen.GetEelementsByTagName would work in that case ? Should I convert html response string to DOM using parseHTML first ? – Geek Nov 18 '16 at 18:07
  • @Geek Yes. Once it's parsed from a string. You can use the DOM API to traverse it and extract whatever you want. – Scott Marcus Nov 18 '16 at 19:37