1

I am looking for a quick way to remove null characters from a text file in Windows. The solution consisting in using Notepad++ and replacing "\0" by nothing in all document (as described here) is not working with very big files. Mine is about 180M and notepad++ is stuck infinitely trying to do the job.

Toto
  • 89,455
  • 62
  • 89
  • 125
Gaston
  • 589
  • 1
  • 10
  • 34

3 Answers3

2

Here is the solution I found for Windows. The idea is to import this solution from UNIX to Windows.

1) Downdload and install CoreUtil which is a collection of basic file, shell and text manipulation utilities for Windows.

In windows 7 exec files will be typically be installed in c:\Program Files (x86)\GnuWin32\bin

2) remove NULL characters by running this command in cmd window:

tr -d '\000' <input_file >output_file

example:

c:\Program Files (x86)\GnuWin32\bin>tr -d '\000' <putty_measurements_1.log >putty_measurements_2.log
Community
  • 1
  • 1
Gaston
  • 589
  • 1
  • 10
  • 34
2

I know that this is an old post, but i think it would be useful for others. This approach works only if the nulls to delete are on the end of the line (In my case i have lines long 1000+ with 600 characters of null on the end).

Just copy the whole thing, and past it on a new file tab, and automatically notepad will replace all nulls in spaces. Then just save using ctrl+space+s to trim all lines.

Hope this helps

Fabio Piunti
  • 49
  • 1
  • 10
  • Ah! Wonderfully simple! And that bit about trimming the lines is important, too. Thanks for this! Worked like a charm. I merged a bunch of vcf files to combine into one big vcf file to copy contacts from my flipphone to google, and google wouldn't take it until I cleaned it up. I don't know how well this will work for really large files, but for mid-sized ones it will work easily enough. – bgmCoder Dec 06 '19 at 02:41
0

I've been looking for tools to remove trailing NULLs from big files, and the solutions I found didn't work with 1GB+ files or took ages. Therefore I designed my own in C#, which works pretty well, and here it is:

private void CopyContentsUntilNull(string source, bool keepFileDate = true)
{
    string destination = $"{Path.GetDirectoryName(source)}{Path.GetFileNameWithoutExtension(source)}_fixed{Path.GetExtension(source)}";

    var sourceDate = File.GetLastWriteTime(source);

    int bufferSize = 10000;
    var buffer = new byte[bufferSize];
    int nullCount = 0;
    int readCount;

    using (var srcStream = File.OpenRead(source))
    using (var dstStream = File.OpenWrite(destination))
    {
        do
        {
            readCount = srcStream.Read(buffer, 0, bufferSize);
            int bytesToCopy = FindTrailingNull(buffer, readCount);

            if (bytesToCopy > 0)
            {
                if (nullCount > 0)
                {
                    var block = Enumerable.Repeat((byte)0, nullCount).ToArray();
                    dstStream.Write(block, 0, nullCount);
                    nullCount = 0;
                }
                dstStream.Write(buffer, 0, bytesToCopy);
            }

            nullCount += bufferSize - bytesToCopy;

        } while (readCount == bufferSize);
    }

    if (keepFileDate)
        File.SetLastWriteTime(destination, sourceDate);
}

private int FindTrailingNull(byte[] buffer, int readCount)
{
    for (int i = readCount - 1; i >= 0; i--)
        if (buffer[i] != 0)
            return i + 1;

    return 0;
}

Beware that some files already have NULLs at the end, like zip files (from 2 to 4), so you might need to add some at the end until it works. The same applies to docx, xlsx, etc. as they are also zip files.

Andrew
  • 7,602
  • 2
  • 34
  • 42