0

I'm trying to map a file to memory, and calc its hash:

// Declaration in header file which I don't control
void SymCryptSha256(PCBYTE pbData, SIZE_T cbData, PBYTE pbResult);

// MY code
HANDLE hFile = ::CreateFile(...);
HANDLE hMap = ::CreateFileMapping(hFile, nullptr, PAGE_READONLY, 0, 0, nullptr));
byte* pMap = ::MapViewOfFile(hMap, FILE_MAP_READ, 0, 0, 0);

BYTE[64] hash;
ULARGE_INTEGER li;
li.LowPart = ::GetFileSize(file , &li.HighPart);
// This compiles in 64-bit, but errors in 32-bit:
// error C4244: 'argument': conversion from 'ULONGLONG' to 'SIZE_T', possible loss of data
::SymCryptSha256(pMap, li.QuadPart, &hash);

This is because SymCryptSha256's second argument is SIZE_T, which in 32-bit compilation is 32-bit. The desired behavior is:

  • 64-bit: Use the entire size, which is li.QuadPart
  • 32-bit: In case size>4GB, MapViewOfFile would fail anyways. So, just use li.LowPart.

Looks to me like I'll need to do this with #ifdefs - is there a more elegant way for it?

Jonathan
  • 6,939
  • 4
  • 44
  • 61
  • @harry: You're correct, of course. However, this was a simplified example - I do call other APIs later which need the entire size as `SIZE_T`. I've edited the question to reflect that. – Jonathan May 04 '17 at 08:37
  • @Jonathan: Harry is right, I checked my old code and I was also doing just that, pass 0... – Malkocoglu May 04 '17 at 08:37
  • 1
    if you want calc hash of *any* file - you need be ready that file can be >4GB size. so you need not map it *all* in memory (of course this is more simply and faster for relative small files) but partial - chunk by chunk. select chunk size - and map chunk, calc it hash, unmap and map next.. chunk must be big enough for most file use single chunk, but for very big files you will be have multiple map/unmap – RbMm May 04 '17 at 08:42
  • 1
    @Jonathan: You may do SIZE_T cbData = (SIZE_T)li.QuadPart. As SIZE_T bitness changes with architecture, no more warnings... – Malkocoglu May 04 '17 at 08:43
  • *In this particular case* you could just use a cast, because in the case where the cast would be unsafe, the code will have already failed. And as RbMm says it would be preferable to re-architecture anyway. But IMO the general question is interesting in its own right; seems to me there *should* be a more elegant solution than an `#ifdef` but I can't think of one offhand. – Harry Johnston May 04 '17 at 08:45
  • You are trying to be smarter than the operating system. It doesn't need that kind of help, it already knows how to create an MMF for a file. The file system cache is impossible to beat. Using up twice the amount of address space for such a large file is not better. – Hans Passant May 04 '17 at 08:45
  • @Hans, I suspect the actual problem is that he's being required ("header file I don't control") to use a hash function that only works on a contiguous block of memory. – Harry Johnston May 04 '17 at 08:48
  • well design hash function must work cumulative - allow calc hash in several calls - chunk by chunk. we can of course do type cast of #if/#def - but map whole file more than 4GB any way fail in 32bit app. so only correct way map it chunk by chunk and calc hash chunk by chunk – RbMm May 04 '17 at 08:55

1 Answers1

0

In the general case, and using this trick, you could do something like this:

if (li.QuadPart > ((ULONGLONG)((size_t)(-1))) too_big();
size_t result = (size_t)(li.QuadPart);

The compiler should optimize the first line into a no-op in a 64-bit compile.

(In your particular case, you probably don't need it anyway; the code will already have failed.)

Note: as already discussed in the comments, in this particular case it would be preferable, if at all possible, to use a hashing API that allows you to hash the data in chunks.

Community
  • 1
  • 1
Harry Johnston
  • 35,639
  • 6
  • 68
  • 158