0

In a Windows command line environment, I'd like to be able to search a binary file for the last (final) occurrence of hex 06 char ("Ack") and truncate the file from that char to the end of the file, meaning that the found char is also trimmed off. How can I do that? The files can be several hundred megabytes in size.

EDIT: To be fair, I did quite a lot of Googling for code ideas, but my search terms are not bringing me to some kind of way to tackle this. Something like "search binary file for ASCII char hex 06, find last occurrence of that char and truncate the file from that point on," is so vague as to be essentially useless. I'll keep looking!

1 Answers1

0

If you start reading bytes from the end of the file you will find the last ACK (if there is one). Knowing its position, you can now truncate the file.

I'm not good at PowerShell, so there might be some cmdlet I don't know about, but this achieves what you want:

$filename = "C:\temp\FindAck.txt"
$file = Get-Item $filename
$len = $file.Length
$blockSize = 32768
$buffer = new-object byte[] $blockSize

$found = $false
$blockNum = [math]::floor($len / $blockSize)

$mode = [System.IO.FileMode]::Open
$access = [System.IO.FileAccess]::Read
$sharing = [IO.FileShare]::Read
$fs = New-Object IO.FileStream($filename, $mode, $access, $sharing)

$foundPos = -1

while (!$found -and $blockNum -ge 0) {
    $fs.Position = $blockNum * $blockSize
    $bytesRead = $fs.Read($buffer, 0, $blocksize)
    if ($bytesRead -gt 0) {
        for ($i = $bytesRead -1; $i -ge 0; $i--) {
            if ($buffer[$i] -eq 6) {
                $foundPos = $blockNum * $blockSize + $i
                $found = $true
                break
            }
        }
    }
    $blockNum--
}

$fs.Dispose()

if ($foundPos -ne -1) {
    $mode = [System.IO.FileMode]::Open
    $access = [System.IO.FileAccess]::Write
    $sharing = [IO.FileShare]::Read

    $fs = New-Object IO.FileStream($filename, $mode, $access, $sharing)
    $fs.SetLength($foundPos)
    $fs.Dispose()
}

Write-Host $foundPos

The idea of reading in 32KB blocks is to get a reasonable size chunk from the disk to process rather than reading one byte at a time.


References:

Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
  • Well golly, it works. It works too well -- it looks like there was a sneaky ACK lurking between the ACK I wanted to truncate the file from, and the ACK this script found. So my ACK is still in the file. What if I wanted to find these three chars in this order: NUL ACK NUL ( 00 06 00 hex) and trim from that ACK to the end of the file? And I'm not trying to get someone else to do my work, I'm just slow. – That Jack Elliott Dec 02 '18 at 23:40
  • @ThatJackElliott Is the ACK you want to find the second-to-last one in the file, or is it the last NUL ACK NUL that you need? The former is much simpler to get (run the code twice) than the latter. – Andrew Morton Dec 03 '18 at 09:11
  • I think it would be safest if I were to find the last NUL ACK NUL in the file rather than the last NUL alone. The files can apparently contain one or more standalone ACKs in the "tail" that I wish to discard. But the pattern of 0x00 0x06 0x00 is the place I really want to trim the file. Right before the 0x06. The "pipe" here represents the cutting point: 0x00 | 0x06 0x00 – That Jack Elliott Dec 03 '18 at 15:28
  • I want to thank you for your help! We've sorted out how to find the location of the offending byte so we can apply the hatchet and cut off the tail. Specifically, the files we're working with are some mp3s that mp3val.exe is flagging as having garbage at the end. It turns out that mp3val gives the offset location for the Bad Byte in the error message. Their "-f" function fixes the file by writing new data to the tail, while my simply truncating the file at that point also lets it pass muster. So I guess I should mark the question as "solved" and thank you again. – That Jack Elliott Dec 03 '18 at 23:02
  • @ThatJackElliott Ah well, maybe someone else will find the code useful. You can "mark the question as solved," or accept an answer as it's known here, by clicking the tick mark next to number to the left of the answer. – Andrew Morton Dec 04 '18 at 09:15