<0xEF,0xBB,0xBF> character showing up in files. How to remove them?

Question

I am doing compressing of JavaScript files and the compressor is complaining that my files have ï»¿ character in them.

How can I search for these characters and remove them?

That isn't ``, that is `<0xEF,0xBB,0xBF>` that is the BOM of UTF8 files, so you should change the title. How would you like to remove them? By magic fairies? By command line tool? By editing one-by-one? Notepad++ can change encoding to UTF8 without BOM. For example just googling 5 seconds of "strip BOM utf8" I've found this for Linux: http://www.ueber.net/who/mjl/projects/bomstrip/ — xanatos, Sep 04 '11 at 07:27
It might help you get an answer that specifically relates to your problem if you told us what javascript tool you're using to do the compression, on what platform, and what other tools are part of your build process. — SingleNegationElimination, Sep 04 '11 at 07:36
BOMs in UTF-8 are absolute crud. You need to find the producer of that file and tell them to cut it the @#%% out. — tchrist, Sep 04 '11 at 18:25
@xanatos It [sounds like](http://www.herongyang.com/Unicode/Notepad-Byte-Order-Mark-BOM-FEFF-EFBBBF.html) `` is the _UTF-16_ BOM. Since JavaScript represents strings in a UTF-16-like way (UCS-2), I believe the UTF-8 BOM may wind up looking like a UTF-16 BOM when handling the file with JavaScript-based tools (e.g. browser dev tools / Node inspector, or in JS APIs like `String.charCodeAt()`). That might explain what @QuintinPar was seeing. — peterflynn, Jan 30 '15 at 01:52
@peterflynn: `U+FEFF` is the Unicode codepoint used for a BOM, but the BOM itself is how that codepoint is encoded (UTF-8: `0xEF 0xBB 0xBF`, UTF-16LE: `0xFF 0xFE`, UTF-16BE: `0xFE 0xFF`, etc). So the files in question are UTF-8 encoded, which the compressor is detecting when decoding them to actual Unicode codepoints. — Remy Lebeau, Jun 18 '15 at 23:22
@xanatos Regardless of what it is, this is how it manifests, and that's how people can easily find this question using search engines. — BartoszKP, Sep 22 '15 at 08:49
If you want just to show files containing the BOM character, use this one: `grep $'\xEF\xBB\xBF' *.*` — rubo77, Jun 15 '20 at 15:59

score 199 · Answer 1 · edited Feb 13 '17 at 21:39

199

You can easily remove them using vim, here are the steps:

1) In your terminal, open the file using vim:

vim file_name

2) Remove all BOM characters:

:set nobomb

3) Save the file:

:wq

edited Feb 13 '17 at 21:39

ruffin

16,507
9
88
138

answered Feb 28 '13 at 14:11

Mohammad Anini

5,073
4
35
46

This solution worked for me. It is simpler than the selected answer. Thanks – szydan Nov 04 '14 at 10:04
I used this great solution even though I'm normally an emacs partisan. vim ftw – Ellen Spertus Jul 06 '19 at 18:59

ROMANIA_engineer · Answer 2 · 2014-11-13T16:45:12.243

32

Another method to remove those characters - using Vim:

vim -b fileName

Now those "hidden" characters are visible (<feff>) and can be removed.

edited Nov 13 '14 at 16:45

answered Nov 12 '14 at 16:36

ROMANIA_engineer

54,432
29
203
199

Michael Shigorin · Answer 3 · 2015-05-23T08:24:52.010

25

Thanks for the previous answers, here's a sed(1) variant just in case:

sed '1s/^\xEF\xBB\xBF//'

edited May 23 '15 at 08:24

answered Apr 23 '13 at 20:32

Michael Shigorin

982
10
11

1

Other sources suggest to prepend the figure 1 to the patttern, as in "sed '1 s/\xEF\xBB\xBF//'", to only match the first line. However, for me on Mac OS X, neither way works. – Marian Oct 10 '13 at 07:31
1

This worked, and was the best solution for me. Thank you, sir! – Vance Lucas May 20 '14 at 20:41
1

Loved this solution. Easiest to implement and still scalable... :) – Piko Apr 01 '15 at 17:28
1

@Marian A little late, but you can check [Masum's answer](http://stackoverflow.com/a/29490814/198553) that shows why it didn't work on mac. – Somebody still uses you MS-DOS Apr 14 '15 at 05:38
1

Add -i to sed to update the file(s) with the changes. – Johan Jul 14 '17 at 09:30

score 19 · Answer 4 · answered Apr 07 '15 at 11:45

19

On Unix/Linux:

sed 's/\xEF\xBB\xBF//' < inputfile > outputfile

On MacOSX

sed $'s/\xEF\xBB\xBF//' < inputfile > outputfile

Notice the $ after sed for mac.

On Windows

There is Super Sed an enhanced version of sed. For Windows this is a standalone .exe, intended for running from the command line.

answered Apr 07 '15 at 11:45

Masum

1,678
15
19

1

"Notice the $ after sed for mac." - Thank you sir! – Somebody still uses you MS-DOS Apr 14 '15 at 05:37
1

The Bash "C-style" string `$'\xEF\xBB\xBF//'` is a Bash feature, not particularly a Mac or OSX feature. WIth this contruct, Bash will parse the escape sequences into actual bytes before passing the command line to `sed`. Depending on your `sed` variant, this may or may not work (though I'm sure it's useful for OSX users to know that it should work out of the box for them). – tripleee Jul 14 '15 at 10:41
1

maybe sed -i 's/.../.../' – Arthur Nov 25 '16 at 01:06

tripleee · Accepted Answer · 2011-09-05T10:08:02.580

18

perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js

I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...)

Edit: added the -CSD option, as per tchrist's comment.

edited Sep 05 '11 at 10:08

answered Sep 04 '11 at 11:47

tripleee

175,061
34
275
318

1

You need to run with the `-CSD` switch, or with the `PERL_UNICODE` envariable set to `SD`, for that to work. – tchrist Sep 04 '11 at 18:24
Regexp works OK for removing character at the beginning of a line, to replace all characters in a line: 's/\x{fffe}//g'. – Diego Pino Dec 26 '11 at 09:21
2

On Mac OSX, I had to change to: `perl -CSD -pe 's/^\x{feff}//' file.csv` , note the change from to . – mpettis Feb 06 '14 at 03:52
1

@mpettis That's not a BOM then, but a BOM with the bytes reversed. It could happen on any platform, if you convert UTF-16 to UTF-8 and get the byte-order wrong (even though the purpose of the BOM is to prevent that error!) – tripleee Nov 24 '14 at 16:38
What about running this on an entire directory? – blong Apr 13 '17 at 03:43
1

@blong What about it? Ask a separate question if you can't figure it out (but it will probably be marked as a duplicate; first google hit http://stackoverflow.com/questions/1712188/how-do-i-run-a-perl-script-on-multiple-input-files-with-the-same-extension) – tripleee Apr 13 '17 at 03:53

score 6 · Answer 6 · answered Nov 26 '13 at 05:53

6

Using tail might be easier:

tail --bytes=+4 filename > new_filename

answered Nov 26 '13 at 05:53

Dzanvu

523
7
18

2

This technique would fail after the producer of the file removes the BOM. Not scalable... :) – Piko Apr 01 '15 at 17:28

score 3 · Answer 7 · answered Mar 10 '16 at 09:16

3

I've used vimgrep for this

:vim "[\uFEFF]" *

also normal vim search command

/[\uFEFF]

answered Mar 10 '16 at 09:16

score 3 · Answer 8 · edited Oct 29 '13 at 10:16

3

@tripleee's solution didn't work for me. But changing the file encoding to ASCII and again to UTF-8 did the trick :-)

edited Oct 29 '13 at 10:16

Community

1
1

answered Apr 03 '12 at 15:21

Pablo Torrecilla

2,098
20
15

LittletonDoug · Answer 9 · 2017-05-19T14:05:50.637

2

The 'file' command shows if the BOM is present:

For example: 'file myfile.xml' displays: "XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators"

dos2unix will remove the BOM.

edited May 19 '17 at 14:05

answered May 19 '17 at 12:12

LittletonDoug

81
5

score 2 · Answer 10 · answered Dec 17 '18 at 17:17

2

I'm suggest the use of "dos2unix" tool, please test to run dos2unix ./thefile.js.

If necessary try to use something like this for multiple files:

for x in $(find . -type f -exec echo {} +); do dos2unix $x ; done

My Regards.

answered Dec 17 '18 at 17:17

Wellington1993

340
1
3
17

1

I liked your answer - `bomstrip` wasn't easily available on my mac - so taking the time to give you the simple version: `find . -type f -exec dos2unix '{}' +` – dsz Mar 05 '20 at 01:19

score 1 · Answer 11 · answered Apr 14 '13 at 12:22

1

In windows you could use backported recode utility from UnxUtils.

answered Apr 14 '13 at 12:22

Nikita Koksharov

10,283
1
62
71

score 1 · Answer 12 · answered Jan 09 '15 at 23:48

In Sublime Text you can install the Highlighter package and then customize the regular expression in your user settings.

Here I added \uFEFF to the end of the highlighter_regex property.

{
    "highlighter_enabled": true,
    "highlighter_regex": "(\t+ +)|( +\t+)|[\u2026\u2018\u2019\u201c\u201d\u2013\u2014\uFEFF]|[\t ]+$",
    "highlighter_scope_name": "invalid",
    "highlighter_max_file_size": 1048576,
    "highlighter_delay": 3000
}

To overwrite the default package settings place the file here:

~/.config/sublime-text-3/Packages/User/highlighter.sublime-settings

score 0 · Answer 13 · edited Apr 17 '15 at 15:44

0

Save the file without code signature.

edited Apr 17 '15 at 15:44

SwDevMan81

48,814
22
151
184

answered Apr 17 '15 at 13:33

Masood Moshref

370
4
9

<0xEF,0xBB,0xBF> character showing up in files. How to remove them?

13 Answers13

Linked

Related