I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?
$ file test.xml
test.xml: XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines
I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?
$ file test.xml
test.xml: XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines
Open file in VIM:
vi text.xml
Remove BOM encoding:
:set nobomb
Save and quit:
:wq
For a non-interactive solution, try the following command line:
vi -c ":set nobomb" -c ":wq" text.xml
That should remove the BOM, save the file and quit, all from the command line.
A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.
With bash, you can create a UTF-8 BOM with the $''
special quoting form, which implements Unicode escapes: $'\uFEFF'
. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:
sed -i $'1s/^\uFEFF//' file.txt
This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.
If you are using some other shell, you might find that "$(printf '\ufeff')"
produces the BOM character (that works with zsh
as well as any shell without a printf
builtin, provided that /usr/bin/printf
is the Gnu version ), but if you want a Posix-compatible version you could use:
sed "$(printf '1s/^\357\273\277//')" file.txt
(The -i
in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)
Well, just dealt with this today and my preferred way was dos2unix:
dos2unix will remove BOM and also take care of other idiosyncrasies from other SOs:
$ sudo apt install dos2unix
$ dos2unix test.xml
It's also possible to remove BOM only (-r, --remove-bom):
$ dos2unix -r test.xml
Note: tested with dos2unix 7.3.4
IF you are certain that a given file starts with a BOM, then it is possible to remove the BOM from a file with the tail
command:
tail --bytes=+4 withBOM.txt > withoutBOM.txt
If you want to work on a bulk of files, by improving Reginaldo Santos's answers, there is a quick way:
find . -name "*.java" | grep java$ | xargs -n 1 dos2unix
Joshua Pinter's answer works correctly on mac so I wrote a script that removes the BOM from all files in a given folder, see here.
It can be used like follows:
Remove BOM from all files in current directory: rmbom .
Print all files with a BOM in the current directory: rmbom . -a
Only remove BOM from all files in current directory with extension txt or cs: rmbom . -e txt -e cs