Hi to all C coders.
Having looked first for similar questions like mine I couldn't find ones.
How to fetch/compare 4bytes in a portable way (without memcpy/memcmp of course)?
I have never learned C and because of that I am a living proof that without knowing the basics everything becomes a nasty mess afterwards. Anyway, writing words (already) is no time to say 'start with the alphabet'.
ulHashPattern = *(unsigned long *)(pbPattern);
for (a=0; a < ASIZE; a++) bm_bc[a]=cbPattern;
for (j=0; j < cbPattern-1; j++) bm_bc[pbPattern[j]]=cbPattern-j-1;
i=0;
while (i <= cbTarget-cbPattern) {
if ( *(unsigned long *)&pbTarget[i] == ulHashPattern ) {
The above fragment works as it must on Windows 32bit compiler. My desire is all such 4vs4 comparisons to work under 64bit Windows and Linux as well. Many times I need 2,4,8 bytes transfers, in above example I need explicitly 4bytes from some pbTarget offset. Here the actual question: what type should I use instead of unsigned long? (I guess something close to UINT16,UINT32,UINT64 will do). In other words, what 3 types I need in order to represent 2,4,8 bytes ALWAYS independently from the environment.
I believe this basic question causes a lot of troubles, so it should be clarified.
Add-on 2012-Jan-16:
@Richard J. Ross III
I am double-confused! Since I don't know whether Linux uses 1] or 2] i.e. is _STD_USING defined in Linux,
in other words which group is portable the types uint8_t,...,uint64_t or the _CSTD uint8_t,...,_CSTD uint64_t?
1] An excerpt from MVS 10.0 stdint.h
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
typedef _ULonglong uint64_t;
2] An excerpt from MVS 10.0 stdint.h
#if defined(_STD_USING)
...
using _CSTD uint8_t; using _CSTD uint16_t;
using _CSTD uint32_t; using _CSTD uint64_t;
...
With Microsoft C 32bit there is no problem:
; 3401 : if ( *(_CSTD uint32_t *)&pbTarget[i] == *(_CSTD uint32_t *)(pbPattern) )
01360 8b 04 19 mov eax, DWORD PTR [ecx+ebx]
01363 8b 7c 24 14 mov edi, DWORD PTR _pbPattern$GSCopy$[esp+1080]
01367 3b 07 cmp eax, DWORD PTR [edi]
01369 75 2c jne SHORT $LN80@Railgun_Qu@6
But when 64bit is the targeted code, that is what happens:
D:\_KAZE_Simplicius_Simplicissimus_Septupleton_r2-_strstr_SHORT-SHOWDOWN_r7>cl /Ox /Tcstrstr_SHORT-SHOWDOWN.c /Fastrstr_SHORT-SHOWDOWN /w /FAcs
Microsoft (R) C/C++ Optimizing Compiler Version 15.00.30729.01 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
strstr_SHORT-SHOWDOWN.c
strstr_SHORT-SHOWDOWN.c(1925) : fatal error C1083: Cannot open include file: 'stdint.h': No such file or directory
D:\_KAZE_Simplicius_Simplicissimus_Septupleton_r2-_strstr_SHORT-SHOWDOWN_r7>
How about Linux' stdint.h, is it always presented?
I didn't give up and commented it: //#include <stdint.h>
, then compilation went ok:
; 3401 : if ( !memcmp(&pbTarget[i], &ulHashPattern, 4) )
01766 49 63 c4 movsxd rax, r12d
01769 42 39 2c 10 cmp DWORD PTR [rax+r10], ebp
0176d 75 38 jne SHORT $LN1@Railgun_Qu@6
; 3401 : if ( *(unsigned long *)&pbTarget[i] == ulHashPattern )
01766 49 63 c4 movsxd rax, r12d
01769 42 39 2c 10 cmp DWORD PTR [rax+r10], ebp
0176d 75 38 jne SHORT $LN1@Railgun_Qu@6
This very 'unsigned long *' troubles me since gcc -m64 will fetch a QWORD not DWORD, right?
@Mysticial
Just wanted to show the three different translations done by Microsoft CL 32bit v16:
1]
; 3400 : if ( !memcmp(&pbTarget[i], pbPattern, 4) )
01360 8b 04 19 mov eax, DWORD PTR [ecx+ebx]
01363 8b 7c 24 14 mov edi, DWORD PTR _pbPattern$GSCopy$[esp+1080]
01367 3b 07 cmp eax, DWORD PTR [edi]
01369 75 2c jne SHORT $LN84@Railgun_Qu@6
2]
; 3400 : if ( !memcmp(&pbTarget[i], &ulHashPattern, 4) )
01350 8b 44 24 14 mov eax, DWORD PTR _ulHashPattern$[esp+1076]
01354 39 04 2a cmp DWORD PTR [edx+ebp], eax
01357 75 2e jne SHORT $LN83@Railgun_Qu@6
3]
; 3401 : if ( *(uint32_t *)&pbTarget[i] == ulHashPattern )
01350 8b 44 24 14 mov eax, DWORD PTR _ulHashPattern$[esp+1076]
01354 39 04 2a cmp DWORD PTR [edx+ebp], eax
01357 75 2e jne SHORT $LN79@Railgun_Qu@6
The initial goal was to extract (with a single mov instruction respectively *(uint32_t *)&pbTarget[i]) and compare 4bytes versus a register variable 4bytes in length i.e. one RAM access one comparision in a single instruction. Nastily I managed only to reduce the memcmp()'s 3 RAM accesses (applied on pbPattern which points to 4 or more bytes) down to 2, thankfully to the inlining. Now if I want to use memcmp() on first 4bytes of pbPattern (as in 2]) ulHashPattern should be not of type register, whereas 3] needs not such a restriction.
; 3400 : if ( !memcmp(&pbTarget[i], &ulHashPattern, 4) )
The line above gives an error (ulHashPattern is defined as: register unsigned long ulHashPattern; ):
strstr_SHORT-SHOWDOWN.c(3400) : error C2103: '&' on register variable
Yes, you are right: memcmp() saves the situation (but with a limitation) - the fragment 2] is identical to 3] mine dirty style. Obviously my inclination not to use a function when it might be manually coded is a thing of the past but I like it.
Still I am not fully happy from the compilers, I have defined ulHashPattern as a register variable but it is loaded each time from RAM?! Maybe I miss something but this very (mov eax, DWORD PTR _ulHashPattern$[esp+1076]) line degrades performance - an ugly code in my view.