3

I have met a question "How to determine processor word length without using sizeof() in C?" in an interview and I believed I gave the wrong answer.

My code was as follows:

int main(){
    int num = -1;
    int count = 0;    

    unsigned int num_copy = (unsigned int)num;
    while(num_copy >>= 1){
        count++;
    }

    printf("System size of int:%d", (count  + 1)/ 8);

    return 0;
}

The output answer is only decided by compiler options. So, how can I get the right answer (system word length)?

What if I change part of this question from 'processor word length' to 'operating system word length'?

Tommy
  • 301
  • 2
  • 16
  • 7
    What is wrong with `sizeof`? – jxh Apr 08 '15 at 15:35
  • You're assuming a byte has eight bits. – chris Apr 08 '15 at 15:36
  • That means 'unsigned' is not an official description in standard C? – Tommy Apr 08 '15 at 15:50
  • I tried to clarify your question based on your comments. If I failed to capture your intent, feel free to improve or rollback the edit. – Angew is no longer proud of SO Apr 08 '15 at 15:52
  • Compiler will determine the size of the `int`, regardless to system. but you can use `int64_t` or `int32_t`. – BLUEPIXY Apr 08 '15 at 15:52
  • If you mean getting 64-bit for 64-bit CPUs and 32-bit for 32-bit CPUs, `long` is usually used instead. note that int is 32 bit in 64-bit CPUs and for pointer-copying and storing, `long` or (more often) `unsigned long` are used to match CPU word size. – holgac Apr 08 '15 at 15:54
  • Can I say the word length of long is always the same as the word length of CPU no matter I am using 32-bit OS or 64-bit OS? – Tommy Apr 08 '15 at 16:02
  • @JY___ yeah, it's guaranteed by the C standard. Check my comment in Jonathon's reply. – holgac Apr 08 '15 at 16:05
  • @JohnBollinger: No, `num_copy` is unsigned, so not negative. Both the shift and the conversion from -1 are well defined (the conversion giving the largest representable value). – Mike Seymour Apr 08 '15 at 17:01
  • 2
    @holgac Whether an architecture is 32-bit or 64-bit is typically determined by the size of pointers. I think even there are exceptions with weird data models. But `intptr_t` should be the better option than `long`. For example the data model on 64-bit Windows has 32-bit longs. – typ1232 Apr 08 '15 at 17:02
  • @typ1232 usually, but not always. word size actually represents CPU register size. There are some architectures using different sized addresses. I think the best way would be bit shifting a register in assembly. – holgac Apr 08 '15 at 17:14

4 Answers4

3

As @holgac mentioned, the long datatype is always the same size as the machine's native word size:

"A word is the amount of data that a machine can process at one time." "The size of a processor’s general-purpose registers (GPRs) is equal to its word size." "Additionally, the size of the C type long is equal to the word size, whereas the size of the int type is sometimes less than that of the word size"

-- Linux Kernel Development, Ch 17 (3rd edition, pg 381)

As indicated by Thomas Matthews however, this may not apply to machines with small word lengths.

To determine the size of long on your compiler, just use sizeof(long):

int main(void)
{
    printf("long is %d bits on this system\n", (int)sizeof(long)*CHAR_BIT);
    return 0;
}

See also:

Community
  • 1
  • 1
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • 1
    @Angew Agreed, but I interpreted *"system int size"* to mean the size of `int` on your compiler. – Jonathon Reinhart Apr 08 '15 at 15:39
  • @Anger is right, it's different between compiler's int size and system's – Tommy Apr 08 '15 at 15:46
  • @JY___ I completely understand that. Your question was ambiguous. – Jonathon Reinhart Apr 08 '15 at 15:46
  • Can you help me with the correct title of this question? – Tommy Apr 08 '15 at 15:48
  • `long` is always the same size as CPU word length. – holgac Apr 08 '15 at 15:55
  • @holgac is that required by the standard? – eerorika Apr 08 '15 at 15:58
  • @holgac Citation would be great. – Jonathon Reinhart Apr 08 '15 at 15:58
  • 1
    Linux Kernel Development, Ch 17 (3rd edition, pg 381) says that "A word is the amount of data that a machine can process at one time." "The size of a processor’s general-purpose registers (GPRs) is equal to its word size." "Additionally, the size of the C type long is equal to the word size, whereas the size of the int type is sometimes less than that of the word size" – holgac Apr 08 '15 at 16:00
  • @JonathonReinhart also pg 383: "The size of the C long type is guaranteed to be the machine’s word size." – holgac Apr 08 '15 at 16:03
  • @holgac Great, thank you. I've incorporated your comments into my answer. If you'd prefer, you can write your own answer, which I would up-vote. – Jonathon Reinhart Apr 08 '15 at 16:03
  • @JonathonReinhart No need, thank you for the offer :) – holgac Apr 08 '15 at 16:07
  • 1
    Downvote: The `long` type is based on a range of values not the processor's word size. On 8-bit and 16-bit processors, the `long` type would take up more than one processor word in order to meet the range specification. So, by your statement, on an 8-bit processor, the `long` type would have the range of 8 bits, which is smaller than the range of an `int`. – Thomas Matthews Apr 08 '15 at 16:17
  • Sigh, I give up. I'll leave this answer for the discussion in the comments only. – Jonathon Reinhart Apr 08 '15 at 16:20
  • @ThomasMatthews You may be right, the book I've read might be misinforming. Sorry for the confusion, but the book so certainly states that fact that I didn't question it. See this section of the 2nd edition of the same book: http://www.makelinux.net/books/lkd2/ch19lev1sec2 – holgac Apr 08 '15 at 16:36
  • Implementations of C on 16-bit processors routinely use 32-bits for long and 16-bits for int. – user3344003 Apr 08 '15 at 16:51
2

I think my OCD kicked in a bit here, here is the result:

#include <stdio.h>
#include <limits.h>

#define SIZEOF_CHAR sizeof(char)
#define SIZEOF_INT sizeof(int)
#define SIZEOF_LONG sizeof(long)
#define SIZEOF_POINTER sizeof(void *)

#define NIBBLE_BIT 4
#ifndef CHAR_BIT
#define CHAR_BIT 8    // should have been defined in <limits.h>
#endif
#define INT_BIT (SIZEOF_INT * CHAR_BIT)
#define LONG_BIT (SIZEOF_LONG * CHAR_BIT)
#define POINTER_BIT (SIZEOF_POINTER * CHAR_BIT)

int main(void)
{
  char hexchar[SIZEOF_CHAR * 2 + 1],
       hexint[SIZEOF_INT * 2 + 1],
       hexlong[SIZEOF_LONG * 2 + 1],
       hexpointer[SIZEOF_POINTER * 2 + 1];
  int strlen_hexchar, strlen_hexint, strlen_hexlong, strlen_hexpointer;

  strlen_hexchar = sprintf(hexchar, "%x", (unsigned char)-1);
  strlen_hexint = sprintf(hexint, "%x", (unsigned int)-1);
  strlen_hexlong = sprintf(hexlong, "%x", (unsigned long)-1l);
  strlen_hexpointer = sprintf(hexpointer, "%p", (void*)-1l);

  printf("#define SIZEOF_CHAR sizeof(char)                // %2d\n", SIZEOF_CHAR);
  printf("#define SIZEOF_INT sizeof(int)                  // %2d\n", SIZEOF_INT);
  printf("#define SIZEOF_LONG sizeof(long)                  // %2d\n", SIZEOF_LONG);
  printf("#define SIZEOF_POINTER sizeof(void *)           // %2d\n", SIZEOF_POINTER);

  printf("\n");

  printf("#define NIBBLE_BIT %-2d\n", NIBBLE_BIT);
  printf("#ifndef CHAR_BIT\n");
  printf("#define CHAR_BIT %-2d   // should have been defined in <limits.h>\n", CHAR_BIT);
  printf("#endif\n");
  printf("#define INT_BIT (SIZEOF_INT * CHAR_BIT)         // %2d\n", INT_BIT);
  printf("#define INT_LONG (INT_LONG * CHAR_BIT)         // %2d\n", LONG_BIT);
  printf("#define POINTER_BIT (SIZEOF_POINTER * CHAR_BIT) // %2d\n", POINTER_BIT);

  printf("\n");

  printf("\nTest setup...\n");
  printf("\n");

  printf("char hexchar[CHAR_BIT * SIZEOF_CHAR + 1],\n");
  printf("    hexint[CHAR_BIT * SIZEOF_INT + 1],\n");
  printf("    hexlong[CHAR_BIT * SIZEOF_LONG + 1],\n");
  printf("    hexpointer[CHAR_BIT * SIZEOF_POINTER + 1];\n");
  printf("int strlen_hexchar, strlen_hexint, strlen_hexlong, strlen_hexpointer;\n");
  printf("\n");
  printf("strlen_hexchar = sprintf(hexchar, \"%%x\", (unsigned char)-1);\n//    returned %d, hexchar populated with \"%s\"\n",
      strlen_hexchar, hexchar);
  printf("strlen_hexint = sprintf(hexint, \"%%x\", (unsigned int)-1);\n//    returned %d, hexint populated with \"%s\"\n",
      strlen_hexint, hexint);
  printf("strlen_hexlong = sprintf(hexlong, \"%%x\", (unsigned long)-1);\n//    returned %d, hexlong populated with \"%s\"\n",
      strlen_hexlong, hexlong);
  printf("strlen_hexpointer = sprintf(hexpointer, \"%%x\", (void*)-1l);\n//    returned %d, hexpointer populated with \"%s\"\n",
      strlen_hexpointer, hexpointer);

  printf("\n\nTest results...\n");
  printf("\n");

  if (SIZEOF_CHAR * 2 == strlen_hexchar) {
    printf("testing (SIZEOF_CHAR * 2 == strlen_hexchar) [pass]\n");
  } else {
    printf("testing (SIZEOF_CHAR * 2 == strlen_hexchar) [fail]\n");
    printf("  (%d != $d)\n", SIZEOF_CHAR * 2, strlen_hexchar);
  }

  if (SIZEOF_INT * 2 == strlen_hexint) {
    printf("testing (SIZEOF_INT * 2 == strlen_hexint) [pass]\n");
  } else {
    printf("testing (SIZEOF_INT * 2 == strlen_hexint) [fail]\n");
    printf("  (%d != $d)\n", SIZEOF_INT * 2, strlen_hexint);
  }

  if (SIZEOF_LONG * 2 == strlen_hexlong) {
    printf("testing (SIZEOF_LONG * 2 == strlen_hexlong) [pass]\n");
  } else {
    printf("testing (SIZEOF_LONG * 2 == strlen_hexlong) [fail]\n");
    printf("  (%d != $d)\n", SIZEOF_LONG * 2, strlen_hexlong);
  }

  if (SIZEOF_POINTER * 2 == strlen_hexpointer) {
    printf("testing (SIZEOF_POINTER * 2 == strlen_hexpointer) [pass]\n");
  } else {
    printf("testing (SIZEOF_POINTER * 2 == strlen_hexpointer) [fail]\n");
    printf("  (%d != $d)\n", SIZEOF_POINTER * 2, strlen_hexpointer);
  }

  printf("\n");

  if (CHAR_BIT == strlen_hexchar * NIBBLE_BIT) {
    printf("testing (CHAR_BIT == strlen_hexchar * NIBBLE_BIT) [pass]\n");
  } else {
    printf("testing (CHAR_BIT == strlen_hexchar * NIBBLE_BIT) [fail]\n");
    printf("  (%d != $d)\n", CHAR_BIT, strlen_hexchar * NIBBLE_BIT);
  }

  if (INT_BIT == strlen_hexint * NIBBLE_BIT) {
    printf("testing (INT_BIT == strlen_hexint * NIBBLE_BIT) [pass]\n");
  } else {
    printf("testing (INT_BIT == strlen_hexint * NIBBLE_BIT) [fail]\n");
    printf("  (%d != $d)\n", INT_BIT, strlen_hexint * NIBBLE_BIT);
  }

  if (LONG_BIT == strlen_hexlong * NIBBLE_BIT) {
    printf("testing (LONG_BIT == strlen_hexlong * NIBBLE_BIT) [pass]\n");
  } else {
    printf("testing (LONG_BIT == strlen_hexlong * NIBBLE_BIT) [fail]\n");
    printf("  (%d != $d)\n", LONG_BIT, strlen_hexlong * NIBBLE_BIT);
  }

  if (POINTER_BIT == strlen_hexpointer * 4) {
    printf("testing (POINTER_BIT == strlen_hexpointer * NIBBLE_BIT) [pass]\n");
  } else {
    printf("testing (POINTER_BIT == strlen_hexpointer * NIBBLE_BIT) [fail]\n");
    printf("  (%d != $d)\n", POINTER_BIT, strlen_hexpointer * NIBBLE_BIT);
  }

  printf("\n");

  if ((int)(SIZEOF_POINTER * CHAR_BIT) == strlen_hexpointer * NIBBLE_BIT) {
    printf("testing ((int)(SIZEOF_POINTER * CHAR_BIT) == strlen_hexpointer * NIBBLE_BIT) [pass]\n");
  } else {
    printf("testing ((int)(SIZEOF_POINTER * CHAR_BIT) == strlen_hexpointer * NIBBLE_BIT) [fail]\n");
    printf("  (%d != %d)\n", (int)(SIZEOF_POINTER * CHAR_BIT), strlen_hexpointer * NIBBLE_BIT);
  }

  printf("\nConclusion: this machine word is %d bytes and %d bits\n", SIZEOF_POINTER * 8 / CHAR_BIT, strlen_hexpointer * NIBBLE_BIT);
  if ((int)(SIZEOF_POINTER * CHAR_BIT) != strlen_hexpointer * NIBBLE_BIT) {
    printf(" * however this conclusion did not pass the (int)(SIZEOF_POINTER * 8 / CHAR_BIT) == strlen_hexpointer * NIBBLE_BIT) test\n");
  }

  return 0;
}

The output from this code shows the following on my machine:

$ sizeofword.exe # from mingw32 shell on windows7
#define SIZEOF_CHAR sizeof(char)                //  1
#define SIZEOF_INT sizeof(int)                  //  4
#define SIZEOF_LONG sizeof(long)                  //  4
#define SIZEOF_POINTER sizeof(void *)           //  4

#define NIBBLE_BIT 4
#ifndef CHAR_BIT
#define CHAR_BIT 8    // should have been defined in <limits.h>
#endif
#define INT_BIT (SIZEOF_INT * CHAR_BIT)         // 32
#define INT_LONG (INT_LONG * CHAR_BIT)         // 32
#define POINTER_BIT (SIZEOF_POINTER * CHAR_BIT) // 32


Test setup...

char hexchar[CHAR_BIT * SIZEOF_CHAR + 1],
    hexint[CHAR_BIT * SIZEOF_INT + 1],
    hexlong[CHAR_BIT * SIZEOF_LONG + 1],
    hexpointer[CHAR_BIT * SIZEOF_POINTER + 1];
int strlen_hexchar, strlen_hexint, strlen_hexlong, strlen_hexpointer;

strlen_hexchar = sprintf(hexchar, "%x", (unsigned char)-1);
//    returned 2, hexchar populated with "ff"
strlen_hexint = sprintf(hexint, "%x", (unsigned int)-1);
//    returned 8, hexint populated with "ffffffff"
strlen_hexlong = sprintf(hexlong, "%x", (unsigned long)-1);
//    returned 8, hexlong populated with "ffffffff"
strlen_hexpointer = sprintf(hexpointer, "%x", (void*)-1l);
//    returned 8, hexpointer populated with "FFFFFFFF"


Test results...

testing (SIZEOF_CHAR * 2 == strlen_hexchar) [pass]
testing (SIZEOF_INT * 2 == strlen_hexint) [pass]
testing (SIZEOF_LONG * 2 == strlen_hexlong) [pass]
testing (SIZEOF_POINTER * 2 == strlen_hexpointer) [pass]

testing (CHAR_BIT == strlen_hexchar * NIBBLE_BIT) [pass]
testing (INT_BIT == strlen_hexint * NIBBLE_BIT) [pass]
testing (LONG_BIT == strlen_hexlong * NIBBLE_BIT) [pass]
testing (POINTER_BIT == strlen_hexpointer * NIBBLE_BIT) [pass]

testing ((int)(SIZEOF_POINTER * CHAR_BIT) == strlen_hexpointer * NIBBLE_BIT) [pass]

Conclusion: this machine word is 4 bytes and 32 bits
Erich Horn
  • 85
  • 6
  • that was run on a 32 bit windows 7 – Erich Horn Oct 31 '15 at 06:33
  • I added long and put it in a gist (https://gist.github.com/f03f1071bce240654ec4.git) – Erich Horn Oct 31 '15 at 06:40
  • And I guess I should add so it will be on point that the sizeof(void*) * CHAR_BIT should be the machine word size in bits, however it might actually be sizeof(void*) * 8. Not really sure, but should actually be the same I think. UTC-8 or 16 or what ever, a char is 8 bits in C. But then again there will always be acceptations. One thing you can rely on is that a nibble is always 4 bits. So using sprintf should validate that. – Erich Horn Oct 31 '15 at 06:57
1

Wasn't sizeof allowed?

Also, a slightly improved implementation (no need for a copy, less runs of the loop and no division):

int main(){
    int num = 1;
    int count = 0;    

    while(num <<= 8){
        count++;
    }

    printf("System size of int:%d", count+1);

    return 0;
}
Valentin Lorentz
  • 9,556
  • 6
  • 47
  • 69
  • You are right about this. sizeof() also gets the compiler's int size.I was thinking whether there is some way like macro #program to get this done. – Tommy Apr 08 '15 at 15:44
0

As an interview question, the only correct way to do it in straight C is to use conditional compilation. The conditional compilation allows the word size to be defined differently for the various platforms the software runs on, or identified in a way so that the correct size can be obtained from a database. Since a company knows which platforms the product will run on, or which platforms they are willing to support, the platform can be chosen at compile or run-time, and the right word size will be selected as a result.

Any other way of sussing out the word size will either be platform/system specific code or a heuristic. One possible heuristic is to use the size of a pointer to represent the machine word size.

word_size = sizeof(void *);

Given this is a heuristic, there are platforms for which it will fail.

jxh
  • 69,070
  • 8
  • 110
  • 193
  • Will this work on architectures where function pointers are more than just a pointer? I recall Itanium being strange in that regard, with a function "descriptor" or the like. – Jonathon Reinhart Apr 08 '15 at 16:22
  • @JonathonReinhart: I changed it to a regular pointer. The idea is to choose something that the machine is willing to use an entire register to hold. This is likely to work for 16-bit systems, for which `sizeof(long)` has to fail. – jxh Apr 08 '15 at 16:28
  • There are some 64bit architectures using 32bit pointers. See http://en.wikipedia.org/wiki/Memory_address#Word_size_versus_address_size – holgac Apr 08 '15 at 16:45
  • @holgac: Thanks, noted in the answer. – jxh Apr 08 '15 at 16:50