36

I have encountered a similar problem described here (and in other places) - where as on an ajax callback I get a xmlhttp.responseText that seems ok (when I alert it - it shows the right text) - but when using an 'if' statement to compare it to the string - it returns false.

(I am also the one who wrote the server-side code returning that string) - after much studying the string - I've discovered that the string had an "invisible character" as its first character. A character that was not shown. If I copied it to Notepad - then deleted the first character - it won't delete until pressing Delete again.

I did a charCodeAt(0) for the returned string in xmlhttp.responseText. And it returned 65279.

Googling it reveals that it is some sort of a UTF-8 control character that is supposed to set "big-endian" or "small-endian" encoding.

So, now I know the cause of the problem - but... why does that character is being echoed? In the source php I simply use

echo 'the string'...

and it apparently somehow outputs [chr(65279)]the string...

Why? And how can I avoid it?

miken32
  • 42,008
  • 16
  • 111
  • 154
Yuval A.
  • 5,849
  • 11
  • 51
  • 63
  • That depends on the data. Without seeing your code we can't say. Do you control the data the ajax is pulling? How is it being served to the ajax? – Drazisil Jun 30 '11 at 16:48
  • It comes from a php file I wrote. The php echoes the string "CHECKTABLE OK". The thing is - even if I just run the php on a browser - and then copy-paste the echoed string - then I check and see that chr-65279 is at the beginning of the string... – Yuval A. Jun 30 '11 at 16:54
  • BTW, that character is also called the Byte Order Mark (BOM) character... – Yuval A. Jun 30 '11 at 17:08
  • What editor are you using to edit your PHP files? Use an editor that allows changing of the encoding like [EmEditor](http://www.emeditor.com/) and open your PHP file "as binary" and see if you see any weird characters at the beginning of the strings or beginning of the file. That should tell us if the BOMs are in the source file or are added later. – nobody Jun 30 '11 at 18:01
  • I opened the php with an hex editor. The BOM wasn't there. I'm pretty sure it's added later... – Yuval A. Jul 01 '11 at 01:58

12 Answers12

80

To conclude, and specify the solution:

Windows Notepad adds the BOM character (the 3 bytes: EF BB BF) to files saved with utf-8 encoding.

PHP doesn't seem to be bothered by it - unless you include one php file into another - then things get messy and strings gets displayed with character(65279) prepended to them.

You can edit the file with another text editor such as Notepad++ and use the encoding
"Encode in UTF-8 without BOM",
and this seems to fix the problem.

Also, you can save the other php file with ANSI encoding in notepad - and this also seem to work (that is, in case you actually don't use any extended characters in the file, I guess...)

Yuval A.
  • 5,849
  • 11
  • 51
  • 63
  • 4
    Thank you very much for writing this solution here, it saved me several hours of searching! I got lucky that you wrote the character number and that Google just love stackoverflow :-) – t.mikael.d Feb 19 '12 at 22:33
  • This helped me! I had an invisible question mark in front of my @model statement. Thanks! – Christopher Marshall Jun 14 '12 at 21:26
  • This also shows up if you are reading using a BufferedReader in Java on Android and if you then rewrite the file it saves as  which is very nasty, I know I can safely trim that off now, thank you. – Martin Belcher - AtWrk Oct 10 '12 at 12:14
7

If you want to print a string that contains the ZERO WIDTH NO-BREAK SPACE char (e.g., by including an external non-PHP file), try the following code:

echo preg_replace("/\xEF\xBB\xBF/", "", $string);
matfax
  • 634
  • 11
  • 17
4

If you are using Linux or Mac, here is an elegant solution to get rid of the  character in PHP.

If you are using WordPress (25% of Internet websites are powered by WordPress), the chances are that a plugin or the active theme are introducing the BOM character due a file that contains BOM (maybe that file was edited in Windows). If that's the case, go to your wp-content/themes/ folder and run the following command:

grep -rl $'\xEF\xBB\xBF' .

This will search for files with BOM. If you have .php results in the list, then do this:

  1. Rename the file to something like filename.bom.bak.php
  2. Open the file in your editor and copy the content in the clipbard.
  3. Create a new file and paste the content from the clipboard.
  4. Save the file with the original name filename.php

If you are dealing with this locally, then eventually you'd need to re-upload the new files to the server.

If you don't have results after running the grep command and you are using WordPress, then another place to check for BOM files is the /wp-content/plugins folder. Go there and run the command again. Alternatively, you can start deactivating all the plugins and then check if the problem is solved while you active the plugins again.

If you are not using WordPress, then go to the root of your project folder and run the command to find files with BOM. If any file is found, then run the four steps procedure described above.

julianm
  • 869
  • 8
  • 13
  • Thanks a lot @julianm , it helps me finding the file, and yes as @Renoir Dos Reis suggested in the last answer, it was a white space before ` – Awsme Sandy Sep 14 '17 at 11:59
3

You can also remove the character in javascript with:

myString = myString.replace(String.fromCharCode(65279), "" );

Tiago A.
  • 39
  • 2
2

I had this problem and changed my encoding to utf-8 without bom, Ansi, etc with no luck. My problem was caused by using a php include function in the html body. Moving the include function to above my html (above !DOCTYPE tag) resolved the issue.

After I knew my issue I tested include, include_once and require functions. All attempts to include a file from within the html body created the extra miscellaneous 𐃁 character at the spot where the PHP code would start.

I also tried to assign the result of the include to a variable ... i.e $result = include("myfile.txt"); with the same extra character being added

Please note that moving the include above the HTML would not remove the extra character from showing, however it removes it from my data and out of the content area.

1

In addition to the above, I just had this issue when pulling some data from a MySQL database (charset is set to UTF-8) - the issue being the HTML tags, I allowed some basic ones like <p> and <a> when I displayed it on the page, I got the &#65729 character looking through Dev Tools in Chrome.

So I removed the tags from the table and that removed the &#65729 issue (and the blank line above the where the text was to be displayed.

I just wanted to add to this, since my Rep isn't high enough to actually comment on the answer.

EDIT: Using VIM I was able to remove the BOM with :set nobomb and you can confirm the presence of the BOM with :set bomb? which will display either bomb or nobomb

James Groves
  • 71
  • 1
  • 4
1

I use "Dreamweaver CC 2015", by default it has this option enabled: "include BOM signature" or something like that, when you click on save as option from file menu. In the window that apears, you can see "Unicode Options..". You can disable the BOM option. And remeber to change all your files like that. Or you can simply go to preferences and disable the BOM option and save all your files.

phpWarrior
  • 11
  • 1
1

I'm using the PhpStorm IDE to develop php pages.

I had this problem and use this option of IDE to remove any BOM characters and problem solved:

File -> Remove BOM

Try to find options like this in your IDE.

halfer
  • 19,824
  • 17
  • 99
  • 186
Sayed Abolfazl Fatemi
  • 3,678
  • 3
  • 36
  • 48
  • Please try to refrain from adding greetings and salutations into your posts. They do not belong here, because this is not a forum. Technical writing is expected. Thank you. – halfer Dec 09 '19 at 21:58
0

When using atom it is a white space on the start of the document before <?php

Barmar
  • 741,623
  • 53
  • 500
  • 612
Renoir Reis
  • 359
  • 4
  • 17
  • thanks for the trick, i took help of the code `grep -rl $'\xEF\xBB\xBF' .` to find the file, and the i found the same white space before ` – Awsme Sandy Sep 14 '17 at 12:00
0

A Linux solution to find and remove this character from a file is to use sed -i 's/\xEF\xBB\xBF//g' your-filename-here

Richard
  • 39,052
  • 6
  • 25
  • 29
0

My solution is create a php file with content:

<?php
header("Content-Type:text/html;charset=utf-8");
?>

Save it as ANSI, then other php file will require/include this before any html or php code

0

Probably something on the server. If you know it's there, I would just bypass it until solved.

myString = myString.substring(1)

Chops off the first character.

Drazisil
  • 3,070
  • 4
  • 33
  • 53
  • That's what I'm going to do for now, but I'd still like to know how to avoid it. The server for now is my localhost on my computer... – Yuval A. Jun 30 '11 at 17:07
  • It has to have something to do with the source. Are you creating the text 'CHECKTABLE OK' yourself, or just echoing a response from a function? – Drazisil Jun 30 '11 at 17:13
  • I create it myself. Also, If I just do a simple echo in a php and check the string - that character is also prefixed with the string always. It should be something related to utf-8 encoding. (btw, The files are saved as utf-8 using Windows Notepad...) Like, to somehow tell php to not put that character all the time... I don't know how though.... – Yuval A. Jun 30 '11 at 17:17
  • Hopefully someone else can come up with an answer. I hate BOMs. On an unrelated note, if you use notepad, I replaced mine with [notepad2](http://www.flos-freeware.ch/notepad2.html). It has syntax highlighting and allows you to change the encoding. Very helpful. – Drazisil Jun 30 '11 at 17:23