In answer to your first question, the bottom 128 code points of Unicode are ASCII. There's no real distinction between the two.
The reason you're seeing 65
is because the thing you're outputting (a
) is an int
rather than a char
(it may have started as a char
but, by putting it into a
, you modified how it would be treated in future).
For your second question, a byte is a char
, at least as far as the ISO C and C++ standards are concerned. If CHAR_BIT
is defined as 8
, that's how wide your char
type is.
However, you should keep in mind the difference between Unicode code points and Unicode representations (such as UTF-8). Having CHAR_BIT == 8
will still allow Unicode to work if UTF-8 representation is used.
My advice would be to capture the output of you program with a hex dump utility, you may well find the Unicode character is coming out as e2 88 ab
, which is the UTF-8 representation of U+222B
. It will then be interpreted by something outside of the program (eg, the terminal program) to render the correct glyph(s):
#include <iostream>
using namespace std;
int main() { cout << "\u222B\n"; }
Running that program above shows what's being output:
pax> g++ -o testprog testprog.cpp ; ./testprog
∫
pax> ./testprog | hexdump
0000000 e2 88 ab 0a
You could confirm that by generating the same UTF-8 byte sequence in a different way:
pax> printf "\xe2\x88\xab\n"
∫