5

I have the following code block (NOT written by me), which performs mapping and recodes ASCII characters to EBCDIC.

// Variables.
CodeHeader* tchpLoc = {};
...
memset(tchpLoc->m_ucpEBCDCMap, 0xff, 256);
for (i = 0; i < 256; i++) {
    if (tchpLoc->m_ucpASCIIMap[i] != 0xff) {
        ucTmp2 = i;
        asc2ebn(&ucTmp1, &ucTmp2, 1);
        tchpLoc->m_ucpEBCDCMap[ucTmp1] = tchpLoc->m_ucpASCIIMap[i];
    }
}

The CodeHeader definition is

typedef struct {
    ...
    UCHAR* m_ucpASCIIMap; 
    UCHAR* m_ucpEBCDCMap; 
} CodeHeader;

and the method that seems to be giving me problems is

void asc2ebn(char* szTo, char* szFrom, int nChrs)
{
    while (nChrs--)
        *szTo++ = ucpAtoe[(*szFrom++) & 0xff];
}

[Note, the unsigned char array ucpAtoe[256] is copied at the end of the question for reference].

Now, I have an old C application and my C++11 conversion running side-by-side, the two codes write a massive .bin file and there is a tiny discrepancy which I have traced to the above code. What is happening for both codes is that the block

...
    if (tchpLoc->m_ucpASCIIMap[i] != 0xff) {
        ucTmp2 = i;
        asc2ebn(&ucTmp1, &ucTmp2, 1);
        tchpLoc->m_ucpEBCDCMap[ucTmp1] = tchpLoc->m_ucpASCIIMap[i];
    }

gets entered into for i = 32 and the asc2ebn method returns ucTmp1 as 64 or '@' for both C and C++ variants great. The next entry is for i = 48, for this value the asc2ebn method returns ucTmp1 as 240 or 'ð' and the C++ code returns ucTmp1 as -16 or 'ð'. My question is why is this lookup/conversion producing different results for exactly the same input and look up array (copied below)?

In this case the old C code is taken as correct, so I want the C++ to produce the same result for this lookup/conversion. Thanks for your time.


static UCHAR ucpAtoe[256] = {
    '\x00','\x01','\x02','\x03','\x37','\x2d','\x2e','\x2f',/*00-07*/
    '\x16','\x05','\x25','\x0b','\x0c','\x0d','\x0e','\x0f',/*08-0f*/
    '\x10','\x11','\x12','\xff','\x3c','\x3d','\x32','\xff',/*10-17*/
    '\x18','\x19','\x3f','\x27','\x22','\x1d','\x35','\x1f',/*18-1f*/
    '\x40','\x5a','\x7f','\x7b','\x5b','\x6c','\x50','\xca',/*20-27*/
    '\x4d','\x5d','\x5c','\x4e','\x6b','\x60','\x4b','\x61',/*28-2f*/
    '\xf0','\xf1','\xf2','\xf3','\xf4','\xf5','\xf6','\xf7',/*30-37*/
    '\xf8','\xf9','\x7a','\x5e','\x4c','\x7e','\x6e','\x6f',/*38-3f*/
    '\x7c','\xc1','\xc2','\xc3','\xc4','\xc5','\xc6','\xc7',/*40-47*/
    '\xc8','\xc9','\xd1','\xd2','\xd3','\xd4','\xd5','\xd6',/*48-4f*/
    '\xd7','\xd8','\xd9','\xe2','\xe3','\xe4','\xe5','\xe6',/*50-57*/
    '\xe7','\xe8','\xe9','\xad','\xe0','\xbd','\xff','\x6d',/*58-5f*/
    '\x79','\x81','\x82','\x83','\x84','\x85','\x86','\x87',/*60-67*/
    '\x88','\x89','\x91','\x92','\x93','\x94','\x95','\x96',/*68-6f*/
    '\x97','\x98','\x99','\xa2','\xa3','\xa4','\xa5','\xa6',/*70-77*/
    '\xa7','\xa8','\xa9','\xc0','\x6a','\xd0','\xa1','\xff',/*78-7f*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*80-87*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*88-8f*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*90-97*/
    '\xff','\xff','\xff','\x4a','\xff','\xff','\xff','\xff',/*98-9f*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*a0-a7*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*a8-af*/
    '\xff','\xff','\xff','\x4f','\xff','\xff','\xff','\xff',/*b0-b7*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*b8-bf*/
    '\xff','\xff','\xff','\xff','\xff','\x8f','\xff','\xff',/*c0-c7*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*c8-cf*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*d0-d7*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*d8-df*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*e0-e7*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*e8-ef*/
    '\xff','\xff','\xff','\x8c','\xff','\xff','\xff','\xff',/*f0-f7*/
    '\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff' };
MoonKnight
  • 23,214
  • 40
  • 145
  • 277
  • 9
    `240` and `-16` are the same value for `char`, aren't they? – Joker_vD Jul 10 '14 at 15:36
  • 3
    Have you tried explicitly using `unsigned char` as opposed to `char` ... since `char` can be either unsigned or signed. – Shafik Yaghmour Jul 10 '14 at 15:38
  • `asc2ebn()` doesn't appear to return anything at all - it's declared `void asc2ebn(...)`. – twalberg Jul 10 '14 at 15:43
  • @twalberg I am passing in a reference `&ucTmp1` which is being changed and returned to me. – MoonKnight Jul 10 '14 at 15:45
  • @Joker_vD no they are not the same. This is indexing a _pointer_, _not_ an array _value_... See http://stackoverflow.com/a/3473686/626442 – MoonKnight Jul 10 '14 at 15:46
  • @sharth no, the code is correct here. `ucTmp2` is being used for the 'from' index and `unTmp1` the 'to' index. – MoonKnight Jul 10 '14 at 15:53
  • @Killercam: Yep. I misread those two parameters. That being said, I think it's going to be difficult for us to help solve this without a [compilable example](http://stackoverflow.com/help/mcve). There's probably a difference in types somewhere that we can't see. – Bill Lynch Jul 10 '14 at 15:55
  • @Killercam I think you missed Joker_vD's point. The point is in your "broken" case, `ucpAtoe[48]` yields the bit pattern `0xf0`. Interpreted as a signed char/int that has the value -16, but an unsigned char/int with that pattern has the value 240. These are exactly the two values you are seeing, so it seems you have one case that is treating your result as signed and one as unsigned. – twalberg Jul 10 '14 at 15:58
  • @twalberg ahhh, sorry yes... I see what you are saying. Thanks for your time. – MoonKnight Jul 10 '14 at 16:09

1 Answers1

2

In both C and C++, the standard doesn't require char to be a signed or unsigned type. It's implementation defined, and apparently, your C compiler decided char to be unsigned char, while your C++ compiler decided it to be signed char.

For GCC, the flag to make char to be unsigned char is -funsigned-char. For MSVC, it's /J.

Joker_vD
  • 3,715
  • 1
  • 28
  • 42
  • Thanks for your time. But wont this will force all `char` values to become `unsigned`? If so, this is not what I want as elsewhere in the code `char` values are used intentionally... I have tried using this flag (`\J`) for the C++ version and this has not helped the conversion. I will rewrite the conversion method `asc2ebn`, perhaps templating it... – MoonKnight Jul 10 '14 at 16:13
  • @Killercam: You don't need to template anything, just access `szTo` and `szFrom` cast to `unsigned char *` or make your `asc2ebn` function accept `unsigned char *`. That's the reason, why there are three distinct types `char`, `signed char`, `unsigned char`. – mafso Jul 10 '14 at 16:43
  • This did not solve the issue but put me directly on the right track so thank you. In the end I overloaded the method to take both `unsigned char*` and `char*`. The reason for this is elsewhere in the code, `char*` is adopted and negative indexes are acepted(!?) - I will have to look into this... Thanks to everyone for there help. – MoonKnight Jul 10 '14 at 16:45
  • 2
    @killercam: You should only use char for C-style strings and C-style character constants. So `char *output="Hello World";` is fine. When you use chars **by value**, you want signed char or unsigned char but not plain char. Plain char leaves you at the mercy of the compiler default settings – SJHowe Jul 10 '14 at 17:36
  • Great advice. Thanks. I am very new to C++ and learning every day. Thanks again. – MoonKnight Jul 10 '14 at 19:13