0

I'm reading an ESRI Shapefile, and to my dismay it uses big endian and little endian at different points (see, for instance, the table at page 4, plus the tables from page 5 to 8).

So I created two functions in C++, one for each endianness.

uint32_t readBig(ifstream& f) {
    uint32_t num;
    uint8_t buf[4];
    f.read((char*)buf,4);
    num = buf[3] | buf[2]<<8 | buf[1]<<16 | buf[0]<<24;
    return num;
}

uint32_t readLittle(ifstream& f) {
    uint32_t num;
    f.read(reinterpret_cast<char *>(&num),4);
    //f.read((char*)&num,4);
    return num;
}

But I'm not sure this is the most efficient way to do it. Can this code be improved? Keep in mind it will run thousands, maybe millions of times for a single shapefile. So to have even one of the functions calling the other seem worse than to have two separate functions. Is there a difference in performance between using reinterpret_cast or explicit type conversion (char*)? Should I use the same in both functions?

Rodrigo
  • 4,706
  • 6
  • 51
  • 94
  • stack overflow isn't for code reviews. Put your code on godbolt - the actual byte swap should compile down to a single opcode under optimization - if it doesn't, it's bad. – xaxxon Nov 29 '18 at 01:20
  • https://godbolt.org/z/B2GWZT like that – xaxxon Nov 29 '18 at 01:25
  • You can implement readBig in terms of readLittle plus byteswapping so you only need to maintain one function. Then you wouldn't end up using two different casts. – Retired Ninja Nov 29 '18 at 01:26
  • @xaxxon Thank you, but as you can see, my question isn't about byte swapping. It's about the most efficient way to read 32-bit integers from binary files, which also includes byte swapping. – Rodrigo Nov 29 '18 at 01:31
  • @Rodrigo it's actually quite difficult to figure out what the heck your question is. Regardless, that is still a poor question for stack overflow. There is no reasonable way to answer it. – xaxxon Nov 29 '18 at 01:32
  • @RetiredNinja I don't care about maintaining two functions. What I want is that both perform as good as possible, since each will be called thousands, maybe millions of times. Just to have one calling the other seems a waste of performance, in this case. – Rodrigo Nov 29 '18 at 01:33
  • @xaxxon Do you know exactly what are the differences between reinterpret_cast and explicit type conversion in this case? If you do, I don't see why couldn't you answer it. – Rodrigo Nov 29 '18 at 01:34
  • @Rodrigo stack overflow posts should be limited to one question per post with an [mcve] (emphasis on minimal) for that question to avoid exactly this type of confusion. Feel free to post multiple questions. – xaxxon Nov 29 '18 at 01:37
  • @xaxxon I think that uploading a file for download here is an overkill (so, verifiable and complete go against minimal here). If you can't get the meaning of the question, why don't you leave it for those who can (like nobar on his answer) The reads are different, because in one function I can read directly into a 32-bit integer, and in the other I can't. I simplified it the most I could. If you can do better, why don't you edit the question? – Rodrigo Nov 29 '18 at 01:46
  • @Rodrigo minimal means minimal to reproduce the issue/question situation. It's not some "objectively minimal" concept. They don't go "against" each other in any way. Please read the help on asking a good question to better understand the terms. – xaxxon Nov 30 '18 at 03:09
  • @xaxxon I'm tired of seeing questions with: a) no research effort, b) no reproducible anything, getting hundreds of upvotes. I think only this can explain users with tens or hundreds of thousands of "reputation" here. You came first, you're native in English, you "rock". Ok. If this makes you happy, I'm [trying to understand how does Compiler Explorer work](https://stackoverflow.com/q/53567217/1086511). And as you can see, there were at least 3 users with a mind open enough (beyond rules and regulations) to see an useful question where others just can't see. – Rodrigo Dec 01 '18 at 02:24
  • "Can this code be improved?" is a bad question, and the fact that other people seem to have guessed what you meant does not make it a good question. – xaxxon Dec 03 '18 at 04:57
  • @xaxxon If the difficulty to "guess" what I meant came from the multiple meanings of "most efficient way to do it", a single question -- efficient in which sense? -- is enough to solve it. But I think "difference in performance" already pointed in the right direction. – Rodrigo Dec 03 '18 at 20:02

2 Answers2

2
  1. Casting between pointer types does not affect performance -- In this case, it's just a technicality to make the compiler happy.
  2. If you're really making a separate call to read for every 32-bit value, the time taken by the byte-swapping operation will likely be in the noise. For speed, you probably should have your own buffering layer so that you inner loop doesn't make any function calls.
  3. It's nice if the swap compiles down to a single opcode (like bswap), but whether or not that is possible, or the fastest option, is processor-specific.
  4. If you're really interested in maximizing speed, consider using SIMD intrinsics.
Brent Bradburn
  • 51,587
  • 17
  • 154
  • 173
  • Thank you. The byte-swapping is just necessary. My main concern about performance here is the type cast. Indeed, I must read a lot of bytes at once (the whole group of polygons that make a feature), and then analyze each case of double/big/little endian in the byte array. This will complicate things, since I took so long to have this function working properly (read shapefile feature), but it's a necessary next step. – Rodrigo Nov 29 '18 at 01:54
1

In most cases the compiler should generate a bswap instruction, which is probably sufficient. If however you need something faster than that, vpshufb is your friend...

#include <immintrin.h>
#include <cstdint>

// swap byte order in 16 x int16
inline void swap_16xi16(uint16_t input[16])
{
  constexpr uint8_t mask_data[] = {
    1, 0, 
    3, 2,
    5, 4,
    7, 6,
    9, 8, 
    11, 10,
    13, 12,
    15, 14,
    1, 0, 
    3, 2,
    5, 4, 
    7, 6,
    9, 8, 
    11, 10,
    13, 12,
    15, 14
  };
  const __m256i swapped = _mm256_shuffle_epi8(
    _mm256_loadu_si256((const __m256i*)input), 
    _mm256_loadu_si256((const __m256i*)mask_data)
  );
  _mm256_storeu_si256((__m256i*)input, swapped);
}

// swap byte order in 8 x int32
inline void swap_8xi32(uint32_t input[8])
{
  constexpr uint8_t mask_data[] = {
    3, 2, 1, 0,
    7, 6, 5, 4,
    11, 10, 9, 8,
    15, 14, 13, 12,
    3, 2, 1, 0,
    7, 6, 5, 4,
    11, 10, 9, 8,
    15, 14, 13, 12
  };
  const __m256i swapped = _mm256_shuffle_epi8(
    _mm256_loadu_si256((const __m256i*)input), 
    _mm256_loadu_si256((const __m256i*)mask_data)
  );
  _mm256_storeu_si256((__m256i*)input, swapped);
}

// swap byte order in 4 x int64
inline void swap_4xi64(uint64_t input[4])
{
  constexpr uint8_t mask_data[] = {
    7, 6, 5, 4, 3, 2, 1, 0,
    15, 14, 13, 12, 11, 10, 9, 8,
    7, 6, 5, 4, 3, 2, 1, 0,
    15, 14, 13, 12, 11, 10, 9, 8
  };
  const __m256i swapped = _mm256_shuffle_epi8(
    _mm256_loadu_si256((const __m256i*)input), 
    _mm256_loadu_si256((const __m256i*)mask_data)
  );
  _mm256_storeu_si256((__m256i*)input, swapped);
}

inline void swap_16xi16(int16_t input[16])
  { swap_16xi16((uint16_t*)input); }
inline void swap_8xi32(int32_t input[8])
  { swap_8xi32((uint32_t*)input); }
inline void swap_4xi64(int64_t input[4])
  { swap_4xi64((uint64_t*)input); }
inline void swap_8f(float input[8])
  { swap_8xi32((uint32_t*)input); }
inline void swap_4d(double input[4])
  { swap_4xi64((uint64_t*)input); }
robthebloke
  • 9,331
  • 9
  • 12