2

I need to use Powershell to check if two files are the same but with the following restriction: there are eight specific bytes in the first 2K that are allowed to be different (if you're interested, it's certain timestamp bytes in the superblock of an ext4 image).

The code I found on Stack Overflow (obviously) for doing full checks is as follows:

$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$hash = [System.BitConverter]::ToString(
            $md5.ComputeHash([System.IO.File]::ReadAllBytes("fspec.bin")))

This gives me the hash of the entire file but what I really need is:

  • the first 2K of the file as a byte array so I can check specifics; and
  • the checksum of the remainder of the file to check equality.

The System.IO.File class has ReadAllBytes but does not appear to have the capacity to read a section of the file, nor seek to a specific place.

I have attempted to read in the byte array and use array slicing to get the parts as follows:

$restOfFile = [System.IO.File]::ReadAllBytes("fspec")
$firstTwoK = $restOfFile[0..2048]
$restOfFile = $restOfFile[2048..$restOfFile.Length]
# Then:
#    1. Check bytes in firstTwoK.
#    2. Check MD5 of all bytes in restOfFile.

Unfortunately, the fact that it's a 750M file is causing problems:

Array dimensions exceeded supported range.
At C:\testprog\testprog.ps1:42 char:1
+ ${devBytes} = ${devBytes}[2048..${devBytes}.Length]
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], OutOfMemoryException
    + FullyQualifiedErrorId : System.OutOfMemoryException

Is there a functional way to do what I need?

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 2
    The HashAlgorithm class has an [overload for ComputeHash](https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.computehash?view=netframework-4.8) which accepts the offset and size of the chunk you are interested in. – luciole75w Jan 14 '20 at 02:19

1 Answers1

2

Use one of the derived types of System.Security.Cryptography.HashAlgorithm and use its ComputeHash method to specify an offset. For checking file uniqueness, MD5 is still fine to use, though you can use a stronger algorithm if you choose as well:

$fileBytes = [System.File.IO]::ReadAllBytes("C:\path\to\file.ext")
$md5Cng = [System.Security.Cryptography.MD5Cng]::Create()
$fileHashAfterOffset = $md5Cng.ComputeHash( $fileBytes, 2KB, $fileBytes.length - 2KB )

The first argument of ComputeHash is the file as a Byte[]. The second argument is the offset (e.g. don't include the first x bytes when generating the hash), and the third argument is how many bytes you want to evaluate. In this case, we want the rest of the file, so we take the total number of bytes in the $fileBytes array and subtract the offset from it.

Using 2KB is shorthand to get the number of bytes in 2 kilobytes.

codewario
  • 19,553
  • 20
  • 90
  • 159
  • This is a good answer. What I ended up doing (before you posted) is simply copying the irrelevant bytes from one array to the other before hashing the entirety of both and comparing the hashes (they'll only be equal if all the non-copied bytes are identical, within the normal minimal collision possibilities).But I've tested this and it works so I've given you the votes. Thanks. – paxdiablo Jan 14 '20 at 11:20