2

I'm trying to compare two sting spans.

For example, I have as input:

var span1 = "<SPAN style=\"FONT-SIZE: 9pt; FONT-FAMILY: Arial; FONT-WEIGHT: bold\">hello<FONT style=\"FONT-FAMILY: Wingdings; COLOR: #ffff00\">ll</FONT>hi<BR><BR></SPAN>";

var span2 = "<span style=\"font-weight: bold; font-size: 9pt; font-family: Arial;\">hello<font style=\"color: #FFFF00; font-family: Wingdings;\">ll</font>hi<br><br></span>"

Those two spans will render the same thing in UI. But I can't find a proper way to prove that are equals using JavaScript or JQuery.

I looked at this question, but the response does not work for me because:

  • Nodes' order can change
  • Semi-colons are not mandatory with last attribute
  • Text can use lower or upper cases

Also that question is little bit old. So I thought may be there is a new that I can follow?

Mhd
  • 2,778
  • 5
  • 22
  • 59
  • 2
    Should we ignore the fact that `font` is now an obsolete tag as of HTML 5 and was deprecated as of HTML 4.0.1 around 1999? – Jon P Jan 22 '18 at 23:35
  • Are you trying to determine if they render the same? What about, `style="font: bold Arial 9pt;"` should that also be equivalent? – James Jan 22 '18 at 23:56
  • Those two strings won't render the same in the UI. `span1` has a space between `hello` and `11` as well as between `11` and `hi`, whereas `span2` does not. – Heretic Monkey Jan 23 '18 at 00:00
  • @James That's where I'm struggling: there are many style combination that are equivalent and I need to capture all of them – Mhd Jan 23 '18 at 00:00
  • @MikeMcCaughan You are absolutely right. But spaces are not a big issue as i can remove all of them. May main concern is how to compare all attributes and children of span. I will remove spaces though. – Mhd Jan 23 '18 at 00:04
  • @JonP I'm not aware that font is obsolete!!! But what about all other possible attributes? May main concern is how to compare spans including all possible attributes and children. – Mhd Jan 23 '18 at 00:06
  • Do you just want a true or false as to whether the strings are the same? Then use the "Levenshtein distance". See [this question and its answers for example](https://stackoverflow.com/q/18050932/215552). – Heretic Monkey Jan 23 '18 at 00:09
  • So, you could get the computed style from the rendered elements and compare the cssText property, for each rendered element and its children. Or, you could render both elements, grab them to an image with canvas, and check if all the pixels are the same. In either case it only tells you that it's rendered the same in your browser on your computer, and is by no means definitive. So - what the heck do you need this for (there must be a better way) and/or how far are you prepared to go? – James Jan 23 '18 at 00:10
  • @James I'm trying to identify HTML page by inserting a tag(which is my span) and save it somewhere (Database for example). Next time I render the same page, I will look for the tag to find out whether the page was already tagged or no. But the problem is each browser will change span rendering. I need a way to detect the tag across browsers. – Mhd Jan 23 '18 at 00:36

2 Answers2

1

What you can do is:

  • render the elements by assigning the HTML to the innerHTML property of an existing element
  • recursively walk through the DOM for each resulting tree
  • for each element node, check the element name, and use the getComputedStyle method to compare all computed styles
  • for text nodes, just compare contents. You may have to do some cleanup (trim, replaces spaces, possibly convert to a canonical encoding...)

The details of how accurately you need to match things and how often this will work depend a lot on what you are trying to achieve exactly and the sources of the HTML, so YMMV.

There may also be options involving rendering as images and comparing the images, but afaik this is not quite straightforward in a browser (but could be if you do this server side with a headless browser).

jcaron
  • 17,302
  • 6
  • 32
  • 46
0

In theory, you are looking for an anagram:

var span1 =
    '<SPAN style="FONT-SIZE: 9pt; FONT-FAMILY: Arial; FONT-WEIGHT: bold">hello <FONT style="FONT-FAMILY: Wingdings; COLOR: #ffff00">ll</FONT> hi<BR><BR></SPAN>';

var span2 =
    '<span style="font-weight: bold; font-size: 9pt; font-family: Arial;">hello<font style="color: #FFFF00; font-family: Wingdings;">ll</font>hi<br><br></span>';

const processForAnagram = str => str.replace(/(\s|"|'|;)/gi, '').toLowerCase().split('').sort().join('');
const isAnagram = (first, second) =>
    processForAnagram(first) === processForAnagram(second);

console.log(
    isAnagram(span1, span2)
);

This could present some red herrings, but chances are unlikely the larger the element is.

ryanpcmcquen
  • 6,285
  • 3
  • 24
  • 37