I have a BOM character in my html file. I want to remove It. I have searched a lot and used a lot of scripts and etc... . But no one worked. I have downloaded notepad++ too, but there is not encoding "UTF8 without BOM" in its encoding menu. How can I delete that BOM character? thanks.
3 Answers
You can solve the problem using vim, where you can get easily with MinGW-w64 (If you have installed Git it comes along) or Cygwin.
So, the key is to use:
- The option
-s
, which will execute a vim script with vim commands. - The option
-b
, which will open your file in binary mode, where you'll see those awkward BOM bytes - The option
-n
, which is very important! This option refuses the use of swap files, so all your work runs in memory. It gives you assurance because if the file is large, the swap files can mislead the process.
That said, let's go to the code!
First you create a simple file, here named 'script', which will hold the vim commands
echo 'gg"+gPggdtCZZ' > script
...this weird string says to vim "Go to the beginning of the file, copy the first word and paste it behind the cursor, so delete everything until character 'C', then, save the file"
Note: If your file starts with other character than 'C', you have to specify it. If you have different 'first characters', you can follow the logic and create a bash script which will read the first character and replace it for you in the snippet above.
Run the vim command:
vim -n -b <the_file> -s script

- 282
- 3
- 9
-
5If you want to use Vim, this command is much simpler: `vim
"+set nobomb" "+wq"`. That way, you don't have to know the first visible character of the file. – Neal Gokli Nov 29 '17 at 02:35 -
Can you elaborate on swap files causing trouble for your script? Shouldn't it be transparent? – Neal Gokli Nov 29 '17 at 02:38
-
On Windows, you can just download the Vim installer from https://vim.sourceforge.io/download.php . No need for MinGW-w64 or Cygwin. – Neal Gokli Nov 29 '17 at 02:40
-
1Your suggestion works great! `vim
"+set nobomb" "+wq"` About the swap files, when you work with a lot of files with large size (more than 10MB, for example), on the background vim uses a .swap file instead the original one, hence, it is common you get corrupted files after running through all of them. So, the solution is to load the file direct on memory, with the `-n` option – Leandro Ferreira Fernandes Dec 05 '17 at 12:10
I believe this is not to be seen as a problem. When it is a problem BOM are just 3 bytes EF BB BF
. Can not we just delete this? Or change to something and then closing the file again?
Anyway this thing below can do the trick and change BOM if present to '***'. Run as
x file
where file is the name of the file.
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
const unsigned char BOM[3] = { '\xEF', '\xBB', '\xBF' };
char file_name[64] = { "target.csv" };
if (argc > 1) strcpy(file_name, argv[1]);
FILE* one = fopen(file_name, "r+b");
if (!one) return -1;
unsigned char buffer[64];
int n = fread(buffer, 1, 3, one);
if (n != 3)return -2;
if (memcmp(buffer, BOM, 3) != 0)
{ printf("file '%s' has no BOM\n", file_name);
fclose(one);
return 0;
};
n = fseek(one, 0, SEEK_SET);
if (n != 0) return -3;
buffer[0] = buffer[1] = buffer[2] = '*';
n = fwrite(buffer, 1, 3, one);
if (n == 3)
printf("Byte Order Mark changed to '***'\n");
else
printf("Error writing to file\n");
fclose(one);
return 0;
}

- 1,227
- 1
- 6
- 13