0

I need to compare two html pages for data. Pages are developed using React but the markup is different. However, the content in these pages is same. What is the best way to compare these pages? I am only looking to compare textual data.

I need to compare multiple pages. Is writing specific selectors, extracting values and comparing them is the only solution?

skyboyer
  • 22,209
  • 7
  • 57
  • 64
OpenStack
  • 5,048
  • 9
  • 34
  • 69
  • Can you elaborate more on your requirements? Are there non-texual data to be compared? If it's textual data alone and the content is exactly the same apart from their markup you can try extracting the text from the webpage and compute a hash. Compare the hash across the pages for checking equality. – Shan Eapen Koshy Apr 22 '20 at 05:52
  • @ShanEapenKoshy: I am only looking to compare textual data. How can I extract only data? Do I need to work with selectors and than extract the values. Please elaborate. – OpenStack Apr 22 '20 at 06:34

1 Answers1

0

It's still unclear as to where you are going to perform the check.

Comparing textual data of 2 elements is straight forward with the help of element.innerText property.

var page1 = document.getElementById('page1');
var page2 = document.getElementById('page2');
var result = document.getElementById('result');

if (page1.innerText !== page2.innerText) {
  result.innerHTML = "Pages are different";
} else {
  result.innerHTML = "Pages are same";
}
<!-- Page 1 -->
<div id='page1'>
  <strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry&#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of
  type and scrambled it to make a type specimen book.
</div>

<br><br>

<!-- Page 2 -->
<div id='page2'>
  <div class="different markup"></div>
  <em style="margin: 0px; padding: 0px;">Lorem Ipsum</em> <b>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry&#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.</b>
</div>

<br>
<h3 id="result" style="color:red;"></h3>

Now, when you have to compare one page to another across the internet, then it's better to compute the hash of both the pages and compare the hash for equality checking.

Object.defineProperty(String.prototype, 'hashCode', {
  value: function() {
    var hash = 0, i, chr;
    for (i = 0; i < this.length; i++) {
      chr   = this.charCodeAt(i);
      hash  = ((hash << 5) - hash) + chr;
      hash |= 0; // Convert to 32bit integer
    }
    return hash;
  }
});

var page1Hash = document.getElementById('page1').innerText.hashCode();
var page2Hash = document.getElementById('page2').innerText.hashCode();

var result = document.getElementById('result');

if (page1Hash !== page2Hash) {
  result.innerHTML = "Pages are different";
} else {
  result.innerHTML = "Pages are same";
}
<!-- Page 1 -->
<div id='page1'>
  <strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry&#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of
  type and scrambled it to make a type specimen book.
</div>

<br><br>

<!-- Page 2 -->
<div id='page2'>
  <div class="different markup"></div>
  <em style="margin: 0px; padding: 0px;">Lorem Ipsum</em> <b>is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry&#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.</b>
</div>

<br>
<h3 id="result" style="color:red;"></h3>

References

Shan Eapen Koshy
  • 2,909
  • 1
  • 28
  • 40