30

EDIT: You can see the issue here (look in source).

EDIT2: Interesting, it is not an issue in source. Only with the console (Firebug as well).

I have the following markup in a file called test.html:

​<!DOCTYPE html>
<html>
<head>
    <title>Test Harness</title>
    <link href='/css/main.css' rel='stylesheet' type='text/css' />
</head>
<body>
    <h3>Test Harness</h3>
</body>
</html>

But in Chrome, I see:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
    "&#8203;


        "
    <title>Test Harness</title>
    <link href='/css/main.css' rel='stylesheet' type='text/css' />
    <h3>Test Harness</h3>
</body>
</html>

It looks like &#802 is a zero width space, but what is causing it? I am using Sublime Text 2 with UTF-8 encoding and Google App Engine with Jinja2 (but Jinja is simply loading test.html). Any thoughts?

Thanks in advance.

jds
  • 7,910
  • 11
  • 63
  • 101
  • I saved your code as a HTML file and can't replicate the problem. I don't think you need the closing `/` in your `` tag. Try removing it, chrome auto corrected it for me. – Mal Aug 28 '13 at 03:29
  • Can you post a list to the example page? I don't think the answer is able to be derived from the information above, the problem is almost certainly elsewhere. – Brad Peabody Aug 28 '13 at 03:44
  • 1
    @Mal, Chrome is way smarter at parsing HTML than that. But just to be sure, I removed the closing `/` and the issue persisted. – jds Aug 28 '13 at 04:06
  • @bgp, I've added a link at the top of the post. I agree that it is probably not the HTML itself; I am fairly certain it has to do with either my text editor or GAE. – jds Aug 28 '13 at 04:07
  • Does it really also reposition your header lines like that? – Radio- Aug 28 '13 at 04:48

8 Answers8

28

It is an issue in the source. The live example that you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain’s HTTP Viewer by selecting “Hex” under “Display Format”. Also note that validating the page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message “Line 1, Column 1: Non-space characters found without seeing a doctype first.”

What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the body element (since character data cannot validly appear in the head element or before it), implying an empty head element before it.

The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.

Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE (U+200B) character. It has no visible glyph and no width, so you won’t notice anything in the visual presentation even though browsers treat it as being data at the start of the body element. The notation &#8203; is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.

It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE (U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode names of the characters.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • This seems correct. I would have suspected a UTF-8 byte order mark, but that's not what those bytes are. Strange - there must be something else happening inside the app engine that is causing this funky output. – Brad Peabody Aug 28 '13 at 05:55
  • 2
    Thanks Jukka! Sorry for the delayed response. While your answer was correct, I could not figure out how to resolve my specific issue. For anyone else having the same problem, I opened my file in Vim and saw: '<200b> '. I removed '<200b>', saved, and re-uploaded and the issue is gone. Why this was being inserted in Sublime Text is beyond me. – jds Sep 06 '13 at 01:53
  • I had the same issue, opened in VI, and removed the '<200b>'. Character is removed from my html now. – tnschmidt Oct 06 '14 at 17:48
  • Same issue - using SublimeText 3 I couldn't see the <200b> character, opened the file in VIM and there it was. Problem solved. Thanks everyone. – Mike Maxwell Jan 10 '15 at 19:05
9

I understand that there is a bug in SharePoint 2013 where the HTML editor adds these characters into your content.

I've been dealing with this for a bit and this is the solution I am using which seems to be working. I added this javascript into a file referenced by my masterpage.

var elements = ["h1","h2","h3","h4","p","strong","label","span","a"];
function targetZWS(){
    for (var i = 0; i < elements.length; i++) {
      jQuery(elements[i]).each(function() {
        removeZWS(this);
      });
    }
}
function removeZWS(target) {
  jQuery(target).html(jQuery(target).html().replace(/\u200B/g,''));
}

/*load functions*/
$(document).ready(function() {
    _spBodyOnLoadFunctionNames.push("targetZWS");

});

Links I looked into investigating this:

  1. https://social.msdn.microsoft.com/Forums/sharepoint/en-US/23804eed-8f00-4b07-bc63-7662311a35a4/why-does-sharepoint-put-in-character-code-8203-in-a-richtext-field?forum=sharepointdevelopment

  2. https://social.technet.microsoft.com/Forums/office/en-US/e87a82f0-1ab5-4aa7-bb7f-27403a7f46de/finding-8203-unicode-characters-in-my-source-code?forum=sharepointgeneral

  3. http://www.sharepointpals.com/post/Removing-8203-in-RichTextHTML-field-Sharepoint

grmdgs
  • 585
  • 6
  • 17
  • I added this and it was working for us to solve the intended problem for a while, but it caused another issue, perhaps after a recently applied CU. Still digging into why, but when editing a document item and saving changes, none of the changed fields got updated when saved, with no error messages. Commenting this code out got the forms to save again. – Michael Adamission Apr 10 '18 at 15:42
3

Try this script. It works for me

$( document ).ready(function() {
    var abc = document.body.innerHTML;
    var a = String(abc).replace(/\u200B/g,'');
    document.body.innerHTML = a;
});
2

I have experienced this in a major project I was working on.

The trick is to just:

  • copy the whole code into notepad.

  • save it as a text file.

  • close the file. open it again and copy your code back into your IDE
    environment.

and its voilà, it's gone.!

1

I was able to remove these in Sublime by selecting the characters surrounding it and copy/pasting into Find and Replace.

1

In my case, symbol "&#8203;" did not appear in the code editor MS Code and was visible only in the tab Elements Chrome. It helped to delete the tag after which this symbol appeared and the reprint of this tag was handwritten again, apparently this symbol clung to the ctrl+c / ctrl+v while transferring the code.

Oleg Averkov
  • 304
  • 3
  • 5
1

This “8203;” HTML character is a no width break control. It can easily find in the Google Chrome Browser inspect elements section. And When you try to remove it from your code, most of the Major IDE not showing to me...(Maybe by my preference).

I found the new text editor Brackets download it and open my code in the editor. It shows the character with red dots. Just remove it check everything is working well.

enter image description here

I found this solution from a blog. What is “8203​” HTML character? Why is being injected into my HTML?

Thank You for saving me hours.

Niroshan
  • 360
  • 3
  • 5
-2

I cannot find where it's being injected on my page. I'll investigate it more later, but for now, I just threw this in my page so I can keep working.

$(function(){
    $('body').contents().eq(0).each(function(){
        if(this.nodeName.toString()=='#text' && this.data.trim().charCodeAt(0)==8203){
            $(this).remove();
        }
    });
});