2

How do you test if a character/HTML entity is line-breaking or non-breaking using JS?

examples of line-breaking characters:

  • the good ol'  < that's a space
  • -

examples of non-breaking characters are:

  • &nbsp; < non-breaking space
  • &#8209; < non-breaking hyphen
  • &#8288; < word-joiner

I know there are many more HTML entities that are line/non-breaking characters, I have no clue what they are. How do I check if one is line-breaking without knowing before hand?

  • 1
    You could probably query the Unicode character properties but I'm not sure browsers (you mean a browser rather than Node, right?) include the feature natively and a third-party library (if there's one) would bundle a huge database. What problem do you want to solve exactly? There might be a simpler solution :-? – Álvaro González Mar 02 '16 at 09:59
  • I want to build an html crawler that removes all html tags and replace all non-breaking entities with their line-breaking counterpart. –  Mar 02 '16 at 10:04
  • 1
    A crawler that runs from a browser? And removing entities is a goal itself? – Álvaro González Mar 02 '16 at 11:26
  • Yeah, the crawler only translates HTML you give it. It requires user input. –  Mar 02 '16 at 11:53

1 Answers1

1

You could test it by creating a testing div with minimal width and then check if text-wraps to the next line.

var tester=document.getElementById("test");
var html=document.getElementById("html").value;

function testfor() {
  var tester=document.getElementById("test");
  var html=document.getElementById("html").value;
  var itIs=false;
  tester.innerHTML="a";
  var height_init=tester.clientHeight;
  console.log(height_init);
  tester.innerHTML+=html+"a";
  var height_final=tester.clientHeight;
  console.log(height_final);
  if(height_final > height_init) {
   itIs=true
  }
  document.getElementById("return").innerHTML=itIs;
}
document.getElementById("html").addEventListener("keydown", function(e) {
    if (!e) { var e = window.event; }

    // Enter is pressed
    if (e.keyCode == 13) { testfor(); }
}, false);
#test{
  width: 1px;
  line-height: 30px;
  font-size: 18px;
}
<div id="test">this&nbsp;text&nbsp;box&nbsp;will&nbsp;take&nbsp;any&nbsp;length&nbsp;of&nbsp;string, and&nbsp;test&nbsp;if&nbsp;the&nbsp;lines&nbsp;break, however&nbsp;it&nbsp;was&nbsp;not&nbsp;designed&nbsp;to&nbsp;handle more&nbsp;than&nbsp;one&nbsp;character/entity/tag&nbsp;at&nbsp;a&nbsp;time</div>
<input id="html" type="text" placeholder="type an HTML character/entity/tag"/>
<p id="return">

</p>