3

Assuming little endian architecture and having a large (unsigned char *) memory area, I want to be able to interpret any n <= sizeof(size_t) bytes anywhere in this area as an integer (size_t) value. I want to do it as fast as possible assuming gcc and x64 architecture, but also I would like to be able to offer safer code for other possible scenarios. What are the possible solutions?

Is it possible to do it faster than the following?

static inline size_t bytes2num(const unsigned char * const addr, size_t const len) {
  switch(len) { /* up to sizeof(size_t) bytes after addr has to be allocated */
    case 5: return *(size_t *) addr & 0x0ffffffffffU;
    case 4: return *(size_t *) addr & 0x0ffffffffU;
    case 3: return *(size_t *) addr & 0x0ffffffU;
    case 2: return *(size_t *) addr & 0x0ffffU;
    case 1: return *(size_t *) addr & 0x0ffU;
    case 6: return *(size_t *) addr & 0x0ffffffffffffU;
    case 7: return *(size_t *) addr & 0x0ffffffffffffffU;
    case 8: return *(size_t *) addr & 0x0ffffffffffffffffU;
  }
  return 0;
}

(The order of the branches reflects the actual probability distribution of possible len values, but in fact it does not seem to have any significant impact: the compiler probably uses some constant time solution.) Moreover, am I right that it is an UB according to the standard, but despite this fact, I can expect from gcc either a "correct" interpretation or, with -fstrict-aliasing -Wstrict-aliasing=2, a warning or error (because the pointer aliasing is clearly visible to the compiler) if the compiler behavior should happen to change in the future?

A bit slower (I have compared only the whole program) solution is the following:

static inline size_t bytes2num(const unsigned char * const addr, size_t const len) {
  union { size_t num; unsigned char bytes[8]; } number = { 0 };
  switch(len) {
    case 8: number.bytes[7] = addr[7];
    case 7: number.bytes[6] = addr[6];
    case 6: number.bytes[5] = addr[5];
    case 5: number.bytes[4] = addr[4];
    case 4: number.bytes[3] = addr[3];
    case 3: number.bytes[2] = addr[2];
    case 2: number.bytes[1] = addr[1];
    case 1: number.bytes[0] = addr[0];
  }
  return number.num;
}

Am I right that using this code no alingment problems could arise and that, despite it is still not correct as I write one member of the union and read the other member (see the discussion around https://stackoverflow.com/a/36705613), this "union-based" approach is so widespread that it is supported by almost all compilers? Is there any faster "almost correct" solution?

Finally, is there any faster really correct solution than using shift and add (thanks to Hurkyl for pointing out, of course I tried it but forgot!), which is somewhere between the above and the slowest memcpy?

static inline size_t bytes2num(const unsigned char * const addr, size_t const len) {
  size_t num = 0;
  int i;
  for (i = len - 1; i >= 0; --i) {
    num <<= 8;
    num |= addr[i];
  }
  return num;
}

Footnote: I did not mention a particular revision of the standard and moreover I added c++ tag --- I would like my code to be compilable under any standard from C89 onward, thus I'd like to limit myself to the common subset of the standards (possibly with some optional definitions like empty #define inline etc.)

Community
  • 1
  • 1
  • 1
    Please don't tag C++. This question is about c. I don't know why you added the C++ tag or mentioned that you added it because it's not relevant. C++ and C are different languages. – Eli Sadoff Nov 05 '16 at 15:17
  • You can possibly speed up `memcpy` by using it with `unsigned long long` representing the bits instead of a cstring. – Eli Sadoff Nov 05 '16 at 15:19
  • @Eli Sadoff: Excuse me if I did not catch local customs: AFAIK, C++ and C has a common subset and I would like limit myself to this subset as much as possible (and to prefer C if it would not be possible) --- and I have tried to explain this in the last paragraph. – Pavel Smerk Nov 05 '16 at 15:24
  • 1
    Have you tried the classic -- and completely portable -- shift-and-add approach to accumulate one byte at a time? –  Nov 05 '16 at 15:24
  • 2
    C is not a strict subset of C++ (see [here](http://stackoverflow.com/questions/1201593/c-subset-of-c-where-not-examples)), but unless you are specifically talking about C++, refrain from tagging it. – Eli Sadoff Nov 05 '16 at 15:25
  • How do you know memcpy is slower? Did you measure it? Did you enable optimizing? – 2501 Nov 05 '16 at 15:31
  • @Hurkyl: Oh, of course, you're right, I've completely forgotten this possibility, though I've tried it. Thank you very much, I've updated the question. – Pavel Smerk Nov 05 '16 at 15:37
  • @Eli Sadoff: ... and that's why I've used the term "common subset", both in the original question and in my reaction to you, which term, I hope, should be read as an intersection and not a strict subset. I admit I have little experience, but I guess that limiting to the common subset can make sense as it allows anyone to pick up any piece of my code and directly use it in their code regardless whether they use C or C++ – Pavel Smerk Nov 05 '16 at 15:41
  • 1
    I understand that there exists a common subset (C++ safe C), but that's not reason enough to tag a C question as C++. – Eli Sadoff Nov 05 '16 at 15:43
  • @2501: Yes, I measured it (with `-O2`), but not in some minimal program simulating some random reads, but in a whole program (building some large finite state automaton in memory). It's because some tiny improvements which would not affect the performance of the whole program are not so interesting for me. Regarding the whole program and taken the second solution as the base, the first solution is ca 2.5 % faster, the shift and add is ca 13 % slower, memcpy ca 20 % slower and for instead of memcpy ca 23 % slower. Measured over few dozens of runs, but only one scenario, not some test suite. – Pavel Smerk Nov 05 '16 at 15:58
  • @Eli Sadoff: OK, I'm new here and you've earned your reputation, so I should believe you. :-) I only guess, that let aside the title, there is nothing in my question which would bind it more closely to C than to C++ --- and in fact, both these languages bothers me, as I've written (even though I admit that in the *need* of choice, I would prefer C). So yes, I can ask the same question with C++ in title, but it seemed to me rather superfluous. Oder? :-) – Pavel Smerk Nov 05 '16 at 16:11
  • 1
    @PavelSmerk You theoretically could ask this question as a C++ question. The issue more so is that using the logic that C is most of the time a subset of C++ means that all C questions can be tagged as C++ which just isn't helpful. I only use [tag:C++] if I'm talking about something only in C++ (like classes for example). People helping with C++ are less likely to be able to help with the lower level parts of C. – Eli Sadoff Nov 05 '16 at 16:17
  • 1
    @Eli Sadoff: OK, this sounds quite reasonably for me, thanks. :-) [I would upvote this your comment if I could. :-] – Pavel Smerk Nov 05 '16 at 16:30
  • 1
    @Eli Sadoff: As for the long int int hint: I guess memcpy is so slow, that there's no (such) way to speed it up. But it might speed up (and make correct! :-) the first solution. On the other hand, to represent a large block of bytes as a block of (some fixed size variant of) long int ints would probably lead to more complicated and less transparent code, namely regarding the "writing" part of the code. And the simplicity of the code is (for me) probably of a higher value than e.g. the speed. But perhaps it could be solved by some suitable macros/inlines: may be I'll give it a try, thanks. :-) – Pavel Smerk Nov 05 '16 at 16:46

1 Answers1

1

Your last solution is probably the simplest. But you should initialize num to 0 and use addr instead of &addr:

static inline size_t bytes2num(const unsigned char *addr, size_t len) {
    size_t num = 0;
    memcpy(&num, addr, len);
    return num;
}

Note however that if len is greater than sizeof(num), the above code invokes undefined behavior. If you need a safe solution, you need an extra test:

static inline size_t bytes2num(const unsigned char *addr, size_t len) {
    size_t num = 0;
    memcpy(&num, addr, len <= sizeof(num) ? len : sizeof(num));
    return num;
}

Note also that this method assumes integers are stored in little endian order (least significant byte first).

For a portable solution, still assuming little endian byte ordering, eight bit bytes, and len <= sizeof(size_t) is this loop:

static inline size_t bytes2num(const unsigned char *addr, size_t len) {
    size_t num = 0;
    for (size_t i = 0; i < len; i++) {
        num |= (size_t)addr[i] << (i * 8);
    }
    return num;
}

If your code uses this function with constant values for len, it will be expanded inline probably without a loop ant possibly using a single instruction, depending on the compiler's configuration and abilities.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • You're right with the initialization and the extra `&`, both were my typos. Roughly together with you I've edited the last part of my question, but I guess it's usefull that the memcpy version remains in your response. The endianity and that `len` is not greater than `sizeof(num)` were assumptions given in the question. Interestingly, your last code is (in my scenario) ca 4 % faster than mine, don't know what makes such a difference between `for (i = len; i; i--) num = (num << 8) | addr[i - 1]` (I've changed `i` to `size_t`) and `for (i = 0; i < len; i++) num |= (size_t) addr[i] << (i * 8)` – Pavel Smerk Nov 05 '16 at 23:50
  • @PavelSmerk: you should use http://gcc.godbolt.org to compare the assembly code generated for the different approaches under different optimizing options. – chqrlie Nov 06 '16 at 00:03