2

I have a file 'data.dat' with 26 hex byte values:

22 49 E1 09 62 18 42 8C 66 10 B0 11 84 9C 00 FF E0 40 1F F8 60 07 FE 2C 03 FF

I am trying to read these into c++ and print the hex values to the terminal with:

const int msgbuflen = 26;                
char message [msgbuflen];  

const char* filenameIn = "data.dat";  

FILE* fpIn = fopen(filenameIn, "rb");

if (!fpIn) {
  perror("ERROR: INPUT FILE CANNOT BE OPENED\n");
  exit(EXIT_FAILURE);
}

for (i=0; i<msgbuflen; i++){  // clear the buffer
    message[i] = '0';
}

size_t ret_code = fread(message, sizeof(unsigned char), msgbuflen, fpIn);
for(int i = 0; i < msgbuflen; i++){
  printf("message[%d] : 0x%x\n", i, message[i]);
}

fclose (fpIn);

When run the output for some of the bytes have 3 leading ff values:

message[0] : 0x22
message[1] : 0x49
message[2] : 0xffffffe1
message[3] : 0x9
message[4] : 0x62
message[5] : 0x18
message[6] : 0x42
message[7] : 0xffffff8c
message[8] : 0x66
message[9] : 0x10
message[10] : 0xffffffb0
message[11] : 0x11
message[12] : 0xffffff84
message[13] : 0xffffff9c
message[14] : 0x0
message[15] : 0xffffffff
message[16] : 0xffffffe0
message[17] : 0x40
message[18] : 0x1f
message[19] : 0xfffffff8
message[20] : 0x60
message[21] : 0x7
message[22] : 0xfffffffe
message[23] : 0x2c
message[24] : 0x3
message[25] : 0xffffffff

Why do these leading f's occur? for example message[2] : 0xffffffe1

I have tried formatting the output of the printf hex %x with 0x%01x but it makes no difference to the terminal output. Checking the sizeof each element in the char array, they are still 1 byte as expected:

printf("sizeof(message[2]) : %ld\n", sizeof(message[2]) );

%> sizeof(message[2]) : 1

I am now wondering if this is a formatting problem? There does not appear to be any more than 1 byte in each message element (as expected).

Using std::hex with cout produces the same issue.

  • 1
    Are you aware that when you call an ancient C library function, like `printf`, any `char` parameter value gets automatically to `int`. What do you get for a `sizeof(int)`? – Sam Varshavchik Jan 30 '23 at 17:18
  • try reading your file with std::ifstream, and read into buffer of unsigned 8 bit variables e.g. std::array buffer. – Pepijn Kramer Jan 30 '23 at 17:23
  • @SamVarshavchik I did not know that. `sizeof(int)` gives 4 bytes. does std::cout also do this? As the issue remains when using `std::cout << std::hex` instead of `printf`. – David Scott Jan 30 '23 at 17:26
  • @SamVarshavchik This "ancient" library is where C++ sits on top of. Over 80% of the binary packages on a typical datacenter server is written in "ancient" C while C++ is at best 1-2%. You look at the entire standard C++ libraries and they are all making calls into the standard C library under the hood. – Something Something Jan 30 '23 at 17:46

2 Answers2

4

This is because char is a signed integer on your machine, so 0xFF becomes -1. As %x prints an int, which is 32-bits, it gets sign-extended to 0xFFFFFFFF which is a 32-bit -1.

If you store as an unsigned char, you will not have this problem:

#include <cstdio>

int main() {
    unsigned char message[] = {0x22, 0x49, 0xE1, 0x09, 0x62, 0x18, 0x42, 0x8C, 0x66, 0x10, 0xB0, 0x11, 0x84, 0x9C, 0x00, 0xFF, 0xE0, 0x40, 0x1F, 0xF8, 0x60, 0x07, 0xFE, 0x2C, 0x03, 0xFF }; 
    int msgbuflen = sizeof(message)/sizeof(message[0]);
    for(int i = 0; i < msgbuflen; i++){
        printf("message[%d] : 0x%x\n", i, message[i]);
    }
}

Produces:

Program returned: 0
message[0] : 0x22
message[1] : 0x49
message[2] : 0xe1
message[3] : 0x9
message[4] : 0x62
message[5] : 0x18
message[6] : 0x42
message[7] : 0x8c
message[8] : 0x66
message[9] : 0x10
message[10] : 0xb0
message[11] : 0x11
message[12] : 0x84
message[13] : 0x9c
message[14] : 0x0
message[15] : 0xff
message[16] : 0xe0
message[17] : 0x40
message[18] : 0x1f
message[19] : 0xf8
message[20] : 0x60
message[21] : 0x7
message[22] : 0xfe
message[23] : 0x2c
message[24] : 0x3
message[25] : 0xff

Godbolt: https://godbolt.org/z/oxPqnYqMq

Alternatively, you can just cast each char to unsigned char:

for(int i = 0; i < msgbuflen; i++){
    printf("message[%d] : 0x%x\n", i, (unsigned char)message[i]);
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Something Something
  • 3,999
  • 1
  • 6
  • 21
  • 1
    This was so obvious in retrospect, thank you so much! Marking as solved. – David Scott Jan 30 '23 at 17:32
  • @remylebeau isn’t char signed as per standard? – Something Something Jan 31 '23 at 01:27
  • 1
    @NoleKsum: This should answer your question: [Is char signed or unsigned by default?](https://stackoverflow.com/q/2054939/12149471) I assume that C question also applies to C++. – Andreas Wenzel Jan 31 '23 at 01:31
  • 2
    @NoleKsum "*isn’t char signed as per standard*" - no, [it is implementation-defined](https://stackoverflow.com/a/17097575/65863) whether it is signed or not, per section 3.9.1 [basic.fundamental] of the C++ standard. – Remy Lebeau Jan 31 '23 at 01:32
  • @RemyLebeau wtf that's news to me. Is there any mainstream platform that's not ancient that shows char as unsigned? – Something Something Jan 31 '23 at 02:21
  • 1
    @NoleKsum: According to the link I posted, gcc on the Android NDK has an unsigned `char`. Also, I believe that several other ARM platforms use unsigned `char`. – Andreas Wenzel Jan 31 '23 at 02:22
  • 1
    @NoleKsum most compilers have an option to let the user decide whether `char` is signed or unsigned. MSVC has `/J`, GCC and Clang both have `-funsigned-char`/`-fsigned-char`, etc – Remy Lebeau Jan 31 '23 at 05:28
1

Or in somewhat more current C++ (not using "C" style arrays) and a well defined unsigned 8 bits datatype. (char can be either signed or unsigned depending on the platform you are working on)

#include <array>
#include <iostream>
#include <sstream>
#include <cstdint>
#include <format>

// simulated opening of a std::ifstream
auto open_file()
{
    static std::array<std::uint8_t,26> data
    { 
        0x22, 0x49, 0xE1, 0x09, 0x62, 0x18, 0x42, 0x8C, 0x66, 0x10, 
        0xB0, 0x11, 0x84, 0x9C, 0x00, 0xFF, 0xE0, 0x40, 0x1F, 0xF8, 
        0x60, 0x07, 0xFE, 0x2C, 0x03, 0xFF
    };

    std::istringstream is(std::string{ data.begin(), data.end() });
    return is;
}

int main()
{
    std::array<std::uint8_t, 26> buffer;

    auto is = open_file();
    
    // get doesn't have an overload for std::uint8_t*
    // so check if sizeof char matches sizeof std::uint8_t
    static_assert(sizeof(char) == sizeof(std::uint8_t));
    is.get(reinterpret_cast<char*>(buffer.data()), buffer.size());

    for (const auto byte : buffer)
    {
        std::cout << std::format("0x{:x} ", byte);
    }

    return 0;
}
Pepijn Kramer
  • 9,356
  • 2
  • 8
  • 19
  • There is no std::format available in any release, as far as I know. Only gcc/clang trunk at the moment. It is going to be years until std::format becomes available. – Something Something Jan 30 '23 at 17:44
  • I use it all the time : visual studio 2022. language C++20 (https://en.cppreference.com/w/cpp/utility/format/format) also mentions C++20. So I was not aware this wasn't yet available on clang/gcc Otherwise there is still https://github.com/fmtlib. But the alternative is : `std::cout << "0x" << std::hex << static_cast(byte) << " ";` – Pepijn Kramer Jan 30 '23 at 17:49