90

I am doing compressing of JavaScript files and the compressor is complaining that my files have  character in them.

How can I search for these characters and remove them?

Quintin Par
  • 15,862
  • 27
  • 93
  • 146
  • 16
    That isn't ``, that is `<0xEF,0xBB,0xBF>` that is the BOM of UTF8 files, so you should change the title. How would you like to remove them? By magic fairies? By command line tool? By editing one-by-one? Notepad++ can change encoding to UTF8 without BOM. For example just googling 5 seconds of "strip BOM utf8" I've found this for Linux: http://www.ueber.net/who/mjl/projects/bomstrip/ – xanatos Sep 04 '11 at 07:27
  • 1
    It might help you get an answer that specifically relates to your problem if you told us what javascript tool you're using to do the compression, on what platform, and what other tools are part of your build process. – SingleNegationElimination Sep 04 '11 at 07:36
  • 17
    BOMs in UTF-8 are absolute crud. You need to find the producer of that file and tell them to cut it the @#%% out. – tchrist Sep 04 '11 at 18:25
  • 1
    @xanatos It [sounds like](http://www.herongyang.com/Unicode/Notepad-Byte-Order-Mark-BOM-FEFF-EFBBBF.html) `` is the _UTF-16_ BOM. Since JavaScript represents strings in a UTF-16-like way (UCS-2), I believe the UTF-8 BOM may wind up looking like a UTF-16 BOM when handling the file with JavaScript-based tools (e.g. browser dev tools / Node inspector, or in JS APIs like `String.charCodeAt()`). That might explain what @QuintinPar was seeing. – peterflynn Jan 30 '15 at 01:52
  • 4
    @peterflynn: `U+FEFF` is the Unicode codepoint used for a BOM, but the BOM itself is how that codepoint is encoded (UTF-8: `0xEF 0xBB 0xBF`, UTF-16LE: `0xFF 0xFE`, UTF-16BE: `0xFE 0xFF`, etc). So the files in question are UTF-8 encoded, which the compressor is detecting when decoding them to actual Unicode codepoints. – Remy Lebeau Jun 18 '15 at 23:22
  • 4
    @xanatos Regardless of what it is, this is how it manifests, and that's how people can easily find this question using search engines. – BartoszKP Sep 22 '15 at 08:49
  • If you want just to show files containing the BOM character, use this one: `grep $'\xEF\xBB\xBF' *.*` – rubo77 Jun 15 '20 at 15:59

13 Answers13

199

You can easily remove them using vim, here are the steps:

1) In your terminal, open the file using vim:

vim file_name

2) Remove all BOM characters:

:set nobomb

3) Save the file:

:wq
ruffin
  • 16,507
  • 9
  • 88
  • 138
Mohammad Anini
  • 5,073
  • 4
  • 35
  • 46
32

Another method to remove those characters - using Vim:

vim -b fileName

Now those "hidden" characters are visible (<feff>) and can be removed.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
25

Thanks for the previous answers, here's a sed(1) variant just in case:

sed '1s/^\xEF\xBB\xBF//'
Michael Shigorin
  • 982
  • 10
  • 11
19

On Unix/Linux:

sed 's/\xEF\xBB\xBF//' < inputfile > outputfile

On MacOSX

sed $'s/\xEF\xBB\xBF//' < inputfile > outputfile

Notice the $ after sed for mac.

On Windows

There is Super Sed an enhanced version of sed. For Windows this is a standalone .exe, intended for running from the command line.

Masum
  • 1,678
  • 15
  • 19
  • 1
    "Notice the $ after sed for mac." - Thank you sir! – Somebody still uses you MS-DOS Apr 14 '15 at 05:37
  • 1
    The Bash "C-style" string `$'\xEF\xBB\xBF//'` is a Bash feature, not particularly a Mac or OSX feature. WIth this contruct, Bash will parse the escape sequences into actual bytes before passing the command line to `sed`. Depending on your `sed` variant, this may or may not work (though I'm sure it's useful for OSX users to know that it should work out of the box for them). – tripleee Jul 14 '15 at 10:41
  • 1
    maybe sed -i 's/.../.../' – Arthur Nov 25 '16 at 01:06
18
perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js

I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...)

Edit: added the -CSD option, as per tchrist's comment.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 1
    You need to run with the `-CSD` switch, or with the `PERL_UNICODE` envariable set to `SD`, for that to work. – tchrist Sep 04 '11 at 18:24
  • Regexp works OK for removing character at the beginning of a line, to replace all characters in a line: 's/\x{fffe}//g'. – Diego Pino Dec 26 '11 at 09:21
  • 2
    On Mac OSX, I had to change to: `perl -CSD -pe 's/^\x{feff}//' file.csv` , note the change from to . – mpettis Feb 06 '14 at 03:52
  • 1
    @mpettis That's not a BOM then, but a BOM with the bytes reversed. It could happen on any platform, if you convert UTF-16 to UTF-8 and get the byte-order wrong (even though the purpose of the BOM is to prevent that error!) – tripleee Nov 24 '14 at 16:38
  • What about running this on an entire directory? – blong Apr 13 '17 at 03:43
  • 1
    @blong What about it? Ask a separate question if you can't figure it out (but it will probably be marked as a duplicate; first google hit http://stackoverflow.com/questions/1712188/how-do-i-run-a-perl-script-on-multiple-input-files-with-the-same-extension) – tripleee Apr 13 '17 at 03:53
6

Using tail might be easier:

tail --bytes=+4 filename > new_filename
Dzanvu
  • 523
  • 7
  • 18
  • 2
    This technique would fail after the producer of the file removes the BOM. Not scalable... :) – Piko Apr 01 '15 at 17:28
3

I've used vimgrep for this

:vim "[\uFEFF]" *

also normal vim search command

/[\uFEFF]
3

@tripleee's solution didn't work for me. But changing the file encoding to ASCII and again to UTF-8 did the trick :-)

Community
  • 1
  • 1
Pablo Torrecilla
  • 2,098
  • 20
  • 15
2

The 'file' command shows if the BOM is present:

For example: 'file myfile.xml' displays: "XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators"

dos2unix will remove the BOM.

2

I'm suggest the use of "dos2unix" tool, please test to run dos2unix ./thefile.js.

If necessary try to use something like this for multiple files:

for x in $(find . -type f -exec echo {} +); do dos2unix $x ; done

My Regards.

Wellington1993
  • 340
  • 1
  • 3
  • 17
  • 1
    I liked your answer - `bomstrip` wasn't easily available on my mac - so taking the time to give you the simple version: `find . -type f -exec dos2unix '{}' +` – dsz Mar 05 '20 at 01:19
1

In windows you could use backported recode utility from UnxUtils.

Nikita Koksharov
  • 10,283
  • 1
  • 62
  • 71
1

In Sublime Text you can install the Highlighter package and then customize the regular expression in your user settings.

Here I added \uFEFF to the end of the highlighter_regex property.

{
    "highlighter_enabled": true,
    "highlighter_regex": "(\t+ +)|( +\t+)|[\u2026\u2018\u2019\u201c\u201d\u2013\u2014\uFEFF]|[\t ]+$",
    "highlighter_scope_name": "invalid",
    "highlighter_max_file_size": 1048576,
    "highlighter_delay": 3000
}

To overwrite the default package settings place the file here:

~/.config/sublime-text-3/Packages/User/highlighter.sublime-settings

JJD
  • 50,076
  • 60
  • 203
  • 339
0

Save the file without code signature.

SwDevMan81
  • 48,814
  • 22
  • 151
  • 184
Masood Moshref
  • 370
  • 4
  • 9