0

I think my question boils down to: Why are char and unsigned Char treated differently even on bit operations? They have the same size and look the same

found this but for me it does not explain why it should operate differently on bit operation level


I read two bytes from a file. These have to be be combined into a 16 bit number. I do this with bit shifting << and logical or |.

uint16_t universe = (hb << 8) | lb;

I thought the code worked fine but sometimes I got weirdly high numbers. 65535 instead of 511. It is only happening if lb is bigger then 0x80 but the size of hb doesn't matter.

So why is 0x70 == '\x70' but 0x80 != '\x80' (in some cases)

While writing the question I found that signed char and char are treated differently if they are >= 0x80. (see example below)

Can somebody explain why this is and why it is even happening while doing bit wise operations?

And why doesn't it matter with hb

I think the examples explain ist best

#include <string>
using namespace std;

int main()
{
    //different
    uint8_t hb = 0x01;
    uint8_t lb = 0xff;
    uint16_t universe = (hb << 8) | lb;
    cout<<universe << endl; //511
    
    string line = string("\x01\xff");
    universe = (line[0] << 8) | line[1]; 
    cout<<universe << endl; //65535
    
    //different 
    hb = 0x10;
    lb = 0x80;
    universe = (hb << 8) | lb;
    cout<<universe << endl; //4224
    
    line = string("\x10\x80");
    universe = (line.at(0) << 8) | line.at(1);
    cout<<universe << endl; //65408
    
    universe = ('\x10' << 8) | '\x80';
    cout<<universe << endl; //65408 too 
    
    
    //this is fine
    hb = 0x83;
    lb = 0x70;
    universe = (hb << 8) | lb;
    cout<<universe << endl; //33648
    
    line = string("\x83\x70");
    universe = (line[0] << 8) | line[1];
    cout<<universe << endl; //33648
    
    
    //so it is only happening if lower byte is > 0x80
    
    if(0x70 == '\x70')
        cout << "same 0x70" << endl; //same
        
    unsigned char nn = '\x80';
    if(0x80 == nn)
        cout << "now its same 0x80 again " << endl; //same
    
    uint8_t n = '\x80';
    if(0x80 == n)
        cout << "now its same 0x80 again " << endl; //same
    
    if(0x80 == '\x80')
        cout << "not same 0x80" << endl; //actually not the same ?? 
        
    if(nn == '\x80')
        cout << "not same 0x80" << endl; //actually not the same ?? 
        
    char nnn = '\x80';
    if(nnn == '\x80')
        cout << "same 0x80" << endl; //same I THINK THIS IS THE DIFFERENCE
        
    out << sizeof(nn) << sizeof(n) << sizeof(lb)<< sizeof(lb)<<endl; //1 1 1 1
    
    
    //AND WHY IS THIS WORKING ? here is a value  > 0x80  used but its working fine
    hb = 0x90;
    lb = 0x70;
    universe = (hb << 8) | lb;
    cout<<universe << endl; //36976
    
    line = string("\x90\x70");
    universe = (line[0] << 8) | line[1];
    cout<<universe << endl; //36976
    
    
    
    return 0;
}

I fixed it in my code by doing assigning the chars to a uint8_t. But I don't understand why it's happening. Why are charand unsigned Chartreaded differently on bit operations?

string line = string("\x01\xff");
uint8_t hb = line[0];
uint8_t lb = line[1];

universe = (hb << 8) | lb; 
cout<<universe << endl; //511 like expected
Alex
  • 111
  • 2
  • 9
  • On many systems, apparently including yours, `char` is a signed type. I'm sure there's a duplicate question about this somewhere on the site. – Nate Eldredge Mar 26 '23 at 04:47
  • Why the C++ code is tagged C? – 273K Mar 26 '23 at 04:49
  • @NateEldredge I linked one but I don't get it why the bit operations a differently. both have the same size. or does the system do some weird bit flips even if a sign a hex value – Alex Mar 26 '23 at 04:49
  • @273K slipped in - removed it – Alex Mar 26 '23 at 04:49
  • 2
    It's not really that complicated. If char is signed, then when `'\x7f'` is used in a 16-bit computation, it has the value `0x007f`. When `\x80` is used in a 16-bit computation, it has the value `0xff80`. – Tim Roberts Mar 26 '23 at 04:53
  • 1
    Recall that when you apply an arithmetic operator to two values, at least one of which has type smaller than `int`, then both sides are promoted to `int`. https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules If `char` is signed 8-bit, then `'\x80'` has numerical value -128, and is promoted to the `int` with value -128 (sign extension). Whereas `0x80` has numerical value 128. Then `0x80 == '\x80'` compares the int 128 to the int -128, and indeed they are unequal. – Nate Eldredge Mar 26 '23 at 04:53
  • @TimRoberts but I don't get why the las example is working? (big code section) there is also a value bigger then `'\x7f'` – Alex Mar 26 '23 at 05:07
  • @NateEldredge but why is the las example working? – Alex Mar 26 '23 at 05:07
  • 1
    @Alex *"I don't get why the las example is working?"* -- because your result is `unit16_t`, so there are no higher-order bits than what `line[0]` is supposed to supply. Try changing `universe` to be `uint32_t` (probably `unsigned` would also make the last case fail). – JaMiT Mar 26 '23 at 05:11
  • Your final example is wrong. The result is not 65535, it is 511. I just compiled it. Since all the components are unsigned, there's no sign extension. – Tim Roberts Mar 26 '23 at 05:15
  • @Tim Roberts change it - just forgot to remove/change the comment. But its working like expected – Alex Mar 26 '23 at 05:22

0 Answers0