Removing null characters from big files in Windows

Question

I am looking for a quick way to remove null characters from a text file in Windows. The solution consisting in using Notepad++ and replacing "\0" by nothing in all document (as described here) is not working with very big files. Mine is about 180M and notepad++ is stuck infinitely trying to do the job.

score 2 · Accepted Answer · edited May 23 '17 at 12:25

Here is the solution I found for Windows. The idea is to import this solution from UNIX to Windows.

1) Downdload and install CoreUtil which is a collection of basic file, shell and text manipulation utilities for Windows.

In windows 7 exec files will be typically be installed in c:\Program Files (x86)\GnuWin32\bin

2) remove NULL characters by running this command in cmd window:

tr -d '\000' <input_file >output_file

example:

c:\Program Files (x86)\GnuWin32\bin>tr -d '\000' <putty_measurements_1.log >putty_measurements_2.log

score 2 · Answer 2 · answered Nov 05 '18 at 15:18

2

I know that this is an old post, but i think it would be useful for others. This approach works only if the nulls to delete are on the end of the line (In my case i have lines long 1000+ with 600 characters of null on the end).

Just copy the whole thing, and past it on a new file tab, and automatically notepad will replace all nulls in spaces. Then just save using ctrl+space+s to trim all lines.

Hope this helps

answered Nov 05 '18 at 15:18

Fabio Piunti

49
1
10

Ah! Wonderfully simple! And that bit about trimming the lines is important, too. Thanks for this! Worked like a charm. I merged a bunch of vcf files to combine into one big vcf file to copy contacts from my flipphone to google, and google wouldn't take it until I cleaned it up. I don't know how well this will work for really large files, but for mid-sized ones it will work easily enough. – bgmCoder Dec 06 '19 at 02:41

Andrew · Answer 3 · 2020-08-31T02:15:50.550

I've been looking for tools to remove trailing NULLs from big files, and the solutions I found didn't work with 1GB+ files or took ages. Therefore I designed my own in C#, which works pretty well, and here it is:

private void CopyContentsUntilNull(string source, bool keepFileDate = true)
{
    string destination = $"{Path.GetDirectoryName(source)}{Path.GetFileNameWithoutExtension(source)}_fixed{Path.GetExtension(source)}";

    var sourceDate = File.GetLastWriteTime(source);

    int bufferSize = 10000;
    var buffer = new byte[bufferSize];
    int nullCount = 0;
    int readCount;

    using (var srcStream = File.OpenRead(source))
    using (var dstStream = File.OpenWrite(destination))
    {
        do
        {
            readCount = srcStream.Read(buffer, 0, bufferSize);
            int bytesToCopy = FindTrailingNull(buffer, readCount);

            if (bytesToCopy > 0)
            {
                if (nullCount > 0)
                {
                    var block = Enumerable.Repeat((byte)0, nullCount).ToArray();
                    dstStream.Write(block, 0, nullCount);
                    nullCount = 0;
                }
                dstStream.Write(buffer, 0, bytesToCopy);
            }

            nullCount += bufferSize - bytesToCopy;

        } while (readCount == bufferSize);
    }

    if (keepFileDate)
        File.SetLastWriteTime(destination, sourceDate);
}

private int FindTrailingNull(byte[] buffer, int readCount)
{
    for (int i = readCount - 1; i >= 0; i--)
        if (buffer[i] != 0)
            return i + 1;

    return 0;
}

Beware that some files already have NULLs at the end, like zip files (from 2 to 4), so you might need to add some at the end until it works. The same applies to docx, xlsx, etc. as they are also zip files.

Removing null characters from big files in Windows

3 Answers3

Linked