1

I am trying to analyse some html code and break it into an array of objects.

Here is some example html code:

<slide data-time=5>
  <div class="cds-block-title">Master Calendar</div>
  <div class="cds-block-content">iframe to master calendar</div>
</slide>

<slide data-time=5>
  <div class="cds-block-title">Weather</div>
  <div class="cds-block-content">iframe to master Weather App</div>
</slide>

My goal is to break it down into an object similar to this:

[  
   {  
      "html":"<slide.....</slide>",
      "time":"5",
      "title":"Master Calendar",
      "content":"iframe...."
   },
   {  
      "html":"<slide.....</slide>",
      "time":"5",
      "title":"Master Calendar",
      "content":"iframe...."
   }
]

I have tried a few different approaches. Using Regex (This worked in my test, but not when I put it in production, the .match stopped working as expected, I also read a few posts stating that using regex to parse html code is not the best approach):

function splitSlidesHtml(html){
var html = '<slide data-time="5"><div class="cds-block-title">Activities & Sports</div><div class="cds-block-content">content</div></slide><slide data-time="5"><div class="cds-block-title">weather</div><div class="cds-block-content">content</div></slide>"';
  var slides = html.match(/<slide.*?>(.*?)<\/slide>/g);
  var slidesA = [];
  if (!slides) {
    slidesA.push({"html":html});
  } else {
    for (i in slides){
      var c = {};
      c.html = slides[i];
      c.time = slides[i].match(/(data-time=)(.*?)>/)[2].replace(/['"]+/g, '');  // extract the time, and replace any quotes that might be around it
      c.title = slides[i].match(/<div class="cds-block-title">(.*?)<\/div>/)[1];
      c.content = slides[i].match(/<div class="cds-block-content">(.*?)<\/div>/)[1];
      slidesA.push(c);
    }
  }
  return slidesA;
}  // end splitSlidesHtml

I have also tried using jQuery, which kind-of works, but I don't know enough about parseHTML to know how to make sure it breaks at the different slides.

var slides = $.parseHTML(html);
console.log(slides);
console.log(slides[0].innerHTML);
console.log(slides[0].outerHTML);
Bjorn Behrendt
  • 1,204
  • 17
  • 35
  • 2
    First of all, [don't use regex](http://stackoverflow.com/a/1732454/1810460). Second, you only want to use `parseHTML` if your HTML is for some reason in a string and not in the dom. If that's the case, you should be able to do something like `slides.children('slide').each(function (slide) {})` to be able to interact individually with each slide – Hamms Feb 17 '17 at 20:33
  • When I parseHTML is it only getting the first level? So for my example I will have to parseHTML to break it into slides, then parseHTML again for each slide, to find the title and content. – Bjorn Behrendt Feb 17 '17 at 20:41
  • not at all, you just need to look at things other than the [`innerHTML`](https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML) of your slides, which is unhelpfully converting your [`Elements`](https://developer.mozilla.org/en-US/docs/Web/API/Element) right back to strings. You can, for example, look at [`$(slide).children(".cds-block-title")`](https://api.jquery.com/children/) to get the title `div`, and then call [`.text()`](http://api.jquery.com/text/) on that to get the content – Hamms Feb 17 '17 at 20:44

2 Answers2

2

You can use $.parseHTML() to convert your HTML string into an array of DOM nodes and then loop over the nodes to grab the information you need. .map() is a good use in this case as you are mapping each node to something else.

var html = '<slide data-time=5>\
  <div class="cds-block-title">Master Calendar</div>\
  <div class="cds-block-content">iframe to master calendar</div>\
</slide>\
<slide data-time=5>\
  <div class="cds-block-title">Weather</div>\
  <div class="cds-block-content">iframe to master Weather App</div>\
</slide>';

var slides = $($.parseHTML(html)).map(function () {
    return {
      // store the HTML
      html: this.outerHTML,
      // store the data-time attribute
      time: this.dataset.time,
      // store the title
      title: $('.cds-block-title', this).text(),
      // store the content
      content: $('.cds-block-content', this).text(),
    };
}).get();

console.log(slides);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
Mikey
  • 6,728
  • 4
  • 22
  • 45
  • This helps a lot. When I parse out my html it breaks it up into slide, text, slide, text. How can I force it to just look at the slide during the map? – Bjorn Behrendt Feb 17 '17 at 21:16
  • How are you getting text after each slide? That would mean the HTML string you provided in your example is not accurate. Maybe you could use [.filter()](http://api.jquery.com/filter/) before the `.map()` call. – Mikey Feb 17 '17 at 22:04
0

This is what I finally came up with. I had trouble with map working to get the time correctly.

var html = sheetData['values'][prop]['html'];        
var parsed = $.parseHTML(html);
var isSlide = true;
for (n in parsed){
  var cur = parsed[n];
  if (cur.nodeName == "SLIDE"){
  var curSlide = {
      html: cur.outerHTML, // store the HTML
      time:  cur.dataset.time, // store the data-time attribute
      title: $('.cds-block-title', cur).html(), // store the title
      content: $('.cds-block-content', cur).html(),  // store the content
    };
  } else {
    isSlide = false;
  }
}
Bjorn Behrendt
  • 1,204
  • 17
  • 35