Unknown Character in Netbeans and Console

Question

I have a text file "abc.txt" encoded in utf-8 data being a set of emoticons from wikipedia page:
(^_^) happy

My code extracts this info from the file to the netbeans stdout
My code:

public static void main(String[] args) throws FileNotFoundException {
    Scanner sc=new Scanner(new File("abc.txt"));
    while(sc.hasNext()){
        System.out.println(sc.nextLine());
    }
}

In netbeans the output is this :

enter image description here

While in console the output is:
enter image description here

What is this character?
And how do I remove this?

Delete all content from your file and write it your self just to check what went wrong in your case. — Noman ali abbasi, Dec 20 '13 at 06:26
@Nomanaliabbasi : I manually typed `happy` in notepad, saved as 'abc.txt' in UTF-8 encoding and tried the program. Gives the same non-printable character in the beginning. (BOM apparently) — boxed__l, Dec 20 '13 at 06:56
Changing the encoding from UTF-8 to unicode seems to solve the problem. [BOM WIKI](http://en.wikipedia.org/wiki/Byte_order_mark) — boxed__l, Dec 20 '13 at 07:02
"If you save a file as UTF-8, Notepad will put the BOM (byte order mark) EF BB BF at the beginning of the file." [here](http://stackoverflow.com/questions/6769311/how-windows-notepad-interpret-characters). — boxed__l, Dec 20 '13 at 07:06

score 2 · Accepted Answer · answered Dec 20 '13 at 06:43

The console output looks like a UTF-8 encoded Byte Order Mark (BOM, U+FEFF), bytes 0xEF 0xBB 0xBF, misinterpreted according to some legacy 8-bit character encoding.

Either save the file without BOM, or make your program recognize and skip the BOM at the start of data.

JosefN · Answer 2 · 2013-12-20T06:45:09.980

1

There is a non printable character at the beginning of the file added by a widows editor. It is necessary to remove it in the file or skip it by Java code.

edited Dec 20 '13 at 06:45

answered Dec 20 '13 at 06:28

JosefN

952
6
8

Thanks for your input. I manually typed `happy` in notepad, saved as 'abc.txt' in UTF-8 encoding and tried the program. Gives the same non-printable character in the beginning. Is it standard for UTF-8 documents to do so? – boxed__l Dec 20 '13 at 06:38
1

Sorry, I have not used windows for ages:), Windows editors added a special two bytes at the begging of the file to indicate that it is UTF8 document :), simply remove them. I can not recommend windows tool to do it. Try editor in Netbeens. – JosefN Dec 20 '13 at 06:40

Unknown Character in Netbeans and Console

2 Answers2