2

I apologize up front for the lengthy post, but I'm trying to include the script I've worked with and tested so far. I'm also very new to working with Binary files and PowerShell - and I'm pulling out my hair here. I have a file where I must remove data from a known address to the end of the file. I have referenced multiple articles here on S.O. but the one that seems to point me closest to what I'm wanting to accomplish is here which links to an article I found as well here.

I feel like I'm really close, but I'm not sure I'm using the function correctly, as I'm having a bit of trouble sussing out the regex for a hex equivalent of ".*" to find 0 or more matches to delete the remaining data from the known address to the end of the file. Maybe I'm thinking too complex?

My known address is always 005A08B0, and nothing afterward ever has a repeatable pattern so I can't simply use a pattern like \xF0\x00\x01 or similar to search for.

This portion of the script is not changed - the function I assume would still be the same, and on a loose level, I understand what it's doing - streaming the file specified and going to the end of the file to find the number of matched regex patterns:

function ConvertTo-BinaryString {
    # converts the bytes of a file to a string that has a
    # 1-to-1 mapping back to the file's original bytes. 
    # Useful for performing binary regular expressions.
    [OutputType([String])]
    Param (
        [Parameter(Mandatory = $True, ValueFromPipeline = $True, Position = 0)]
        [ValidateScript( { Test-Path $_ -PathType Leaf } )]
        [String]$Path
    )

    $Stream = New-Object System.IO.FileStream -ArgumentList $Path, 'Open', 'Read'

    # Note: Codepage 28591 returns a 1-to-1 char to byte mapping
    $Encoding     = [Text.Encoding]::GetEncoding(28591)
    $StreamReader = New-Object System.IO.StreamReader -ArgumentList $Stream, $Encoding
    $BinaryText   = $StreamReader.ReadToEnd()

    $StreamReader.Close()
    $Stream.Close()

    return $BinaryText
}

This portion for my input file is super simple to digest:

$inputFile  = 'C:\StartFile.dat'
$outputFile = 'C:\EndFile_test.dat'
$fileBytes  = [System.IO.File]::ReadAllBytes($inputFile)
$binString  = ConvertTo-BinaryString -Path $inputFile

This is where things fall apart, and I assume this would be the only piece I have to really modify:

# This is the portion I am having a problem with - what do I need to do for this regex???
$re = [Regex]'[\x5A08B0]{30}*'

This portion seems like I should not have to modify much, as the position will naturally move through the file and offset itself after each found match?

# use a MemoryStream object to store the result
$ms  = New-Object System.IO.MemoryStream
$pos = $replacements = 0

$re.Matches($binString) | ForEach-Object {
    # write the part of the byte array before the match to the MemoryStream
    $ms.Write($fileBytes, $pos, $_.Index)
    # update the 'cursor' position for the next match
    $pos += ($_.Index + $_.Length)
    # and count the number of replacements done
    $replacements++
}

# write the remainder of the bytes to the stream
$ms.Write($fileBytes, $pos, $fileBytes.Count - $pos)

# save the updated bytes to a new file (will overwrite existing file)
[System.IO.File]::WriteAllBytes($outputFile, $ms.ToArray())
$ms.Dispose()

if ($replacements) {
    Write-Host "$replacements replacement(s) made."
}
else {
    Write-Host "Byte sequence not found. No replacements made."
}

Additionally, I have also tried the following to at least see if I could determine the appropriate address is being referenced on a known file, and this seems like it might be a good start to something different:

#Decimal Equivalent of the Hex Address:
$offset = 5900464

$bytes = [System.IO.File]::ReadAllBytes("C:TestFile.dat");
Echo $bytes[$offset]

When I run the smaller script above, I am at least getting the right character of the known file - it produces the Decimal equivalent of the Ascii char in the file.

I can do this manually w/ a hex-editor, but this has to be possible from a script. . . Appreciate all the help I can get. A few disclosures - it has to be done with programs native to windows 7/windows 10 - cannot download any separate executables, and SysInternals is a no-go as well. Was originally looking at a batch file idea, but I can port a PowerShell command into a batch file easy peasy.

k1dfr0std
  • 379
  • 1
  • 15
  • 1
    . . . The way you've just worded your comment made it seem like I have REALLY over-complicated this issue!!! YES! `TRUNCATE` is the word I was looking for to succinctly put it! This is EXACTLY what I'm wanting to do, but I want to save the data prior to the address and dump the rest after. Is this something that is STUPID Simple and I haven't stared at it long enough? I'll bet once I see it, I will "FACEPALM" – k1dfr0std Sep 23 '21 at 01:28
  • 1
    I have marked your answer as **_`THE ONE`_** - so darn beautiful. I'ma Geek out a bit on this for a while. I will also re-word my question so it's less lengthy and then maybe if others google "How to truncate the end of a binary file after known address using Powershell?" they'll hopefully stumble upon this little gem! – k1dfr0std Sep 23 '21 at 02:40

1 Answers1

2

To simply truncate a file, i.e. to remove any content beyond a given byte offset, you can use System.IO.File's static OpenWrite() method to obtain a System.IO.FileStream instance and call its .SetLength() method:

$inputFile  = 'C:\StartFile.dat'
$outputFile = 'C:\EndFile_test.dat'

# First, copy the input file to the output file.
Copy-Item -LiteralPath $inputFile -Destination $outputFile

# Open the output file for writing.
$fs = [System.IO.File]::OpenWrite($outputFile)

# Set the file length based on the desired byte offset
# in order to truncate it (assuming it is larger).
$fs.SetLength(0x5A08B0)

$fs.Close()

Note: If the given offset amounts to increasing the size of the file, it seems like the additional space is filled with NUL (0x0) bytes, as a quick test on macOS and Windows suggests; however, it seems like this behavior is not guaranteed, judging by the .SetLength() documentation:

If the stream is expanded, the contents of the stream between the old and the new length are undefined.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Oh. My. Goodness. This. This is . . . **`SetLength`** - WHY. . . WHY is this something which has not been referenced ANYWHERE ELSE IN THE ENTIRE INTERNETS when asking "How to truncate a Binary file past known Address"?!? MADDENING. YOU ARE AWESOME. THANK YOU SO VERY VERY MUCH. Out of curiosity - since I noticed your comment above the `SetLength` line - will it set the remaining bytes to NUL (00) if the original file is smaller than the address given? I think I may go cry just a little bit - these tears of joy are a wonderful ending to the day. I cannot thank you enough! – k1dfr0std Sep 23 '21 at 02:37
  • Also - I' saddened to see your comment go - It was like a flash of lightning when I read it - like "WAIT - CAN IT BE?!? SOME LIGHT AT THE END OF THIS TUNNEL!!!!" – k1dfr0std Sep 23 '21 at 02:46
  • 1
    Thank you so much, once again! I popped this little snippet into a batch file capable of parsing multiple files as arguments to the batch file itself and it did exactly as I needed _IN SECONDS_ – k1dfr0std Sep 23 '21 at 03:13
  • 1
    Glad to hear it, @k1dfr0std; re filling with NUL bytes on increasing the file size: it seems like the behavior is _not guaranteed_ - please see my update. – mklement0 Sep 23 '21 at 03:19