Assuming no string less than 4 bytes is ever passed, is there anything wrong with this optimization? And yes it is a significant speedup on the machines I've tested it on when comparing mostly dissimilar strings.
#define STRCMP(a, b) ( (*(int32_t*)a) == (*(int32_t*)b) && strcmp(a, b) == 0)
And assuming strings are no less than 4 bytes, is there a faster way to do this without resorting to assembly, etc?