Do char's in C have pre-assigned zero indexed values?

Question

Sorry if my title is a little misleading, I am still new to a lot of this but:

I recently worked on a small cipher project where the user can give the file a argument at the command line but it must be alphabetical. (Ex: ./file abc)

This argument will then be used in a formula to encipher a message of plain text you provide. I got the code to work, thanks to my friend for helping but i'm not 100% a specific part of this formula.

    #include <stdio.h>
#include <cs50.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <ctype.h>


int main (int argc, string argv[])
{   //Clarify that the argument count is not larger than 2
    if (argc != 2)
    {
        printf("Please Submit a Valid Argument.\n");
        return 1;
    }
    //Store the given arguemnt (our key) inside a string var 'k' and check if it is alpha
    string k = (argv[1]);
    //Store how long the key is
    int kLen = strlen(k);
    //Tell the user we are checking their key
    printf("Checking key validation...\n");
    //Pause the program for 2 seconds
    sleep(2);
    //Check to make sure the key submitted is alphabetical
    for (int h = 0, strlk = strlen(k); h < strlk; h++)
    {
        if isalpha(k[h])
        {
            printf("Character %c is valid\n", k[h]);
            sleep(1);
        }
        else
        {   //Telling the user the key is invalid and returning them to the console
            printf("Key is not alphabetical, please try again!\n");
            return 0;
        }

    }
    //Store the users soon to be enciphered text in a string var 'pt'
    string pt = get_string("Please enter the text to be enciphered: ");
    //A prompt that the encrypted text will display on
    printf("Printing encrypted text: ");
    sleep(2);
    //Encipher Function
    for(int i = 0, j = 0, strl = strlen(pt); i < strl; i++)
    {
        //Get the letter 'key'
        int lk = tolower(k[j % kLen]) - 'a';
        //If the char is uppercase, run the V formula and increment j by 1
        if isupper(pt[i])
        {
            printf("%c", 'A' + (pt[i] - 'A' + lk) % 26);
            j++;
        }
        //If the char is lowercase, run the V formula and increment j by 1
        else if islower(pt[i])
        {
            printf("%c", 'a' + (pt[i] - 'a' + lk) % 26);
            j++;
        }
        //If the char is a symbol just print said symbol
        else
        {
            printf("%c", pt[i]);
        }
    }
    printf("\n");
    printf("Closing Script...\n");
    return 0;
}

The Encipher Function: Uses 'A' as a char for the placeholder but does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)

No. The uppercase A has an ASCII value of 65 decimal, 0x41 hex (see any ASCII chart), not zero. — Ken White, Feb 19 '19 at 23:57
https://stackoverflow.com/questions/1469711/converting-letters-to-numbers-in-c — user3386109, Feb 20 '19 at 00:08
Ciphers typical bypass this issue by transforming an array of bytes to an array of bytes. That the original bytes represent text is out of scope. That you might want to transfer the output as text is out of scope. (Base64 or similar is typically used when that is required.) — Tom Blodget, Feb 20 '19 at 02:15

score 4 · Answer 1 · answered Feb 20 '19 at 00:00

In C, character literals like 'A' are of type int, and represent whatever integer value encodes the character A on your system. On the 99.999...% of systems that use ASCII character encoding, that's the number 65. If you have an old IBM mainframe from the 1970s using EBCDIC, it might be something else. You'll notice that the code is subtracting 'A' to make 0-based values.

This does make the assumption that the letters A-Z occupy 26 consecutive codes. This is true of ASCII (A=65, B=66, etc.), but not of all codes, and not guaranteed by the language.

score 3 · Answer 2 · answered Feb 20 '19 at 00:01

does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)

No. Strictly conforming C code can not depend on any character encoding other than the numerals 0-9 being represented consecutively, even though the common ASCII character set does represent them consecutively.

The only guarantee regarding character sets is per 5.2.1 Character sets, paragraph 3 of the C standard:

... the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous...

Character sets such as EBCDIC don't represent letters consecutively

score 1 · Answer 3 · answered Feb 20 '19 at 00:12

char is a numeric type that happens to also often be used to represent visible characters (or special non-visible pseudo-characters). 'A' is a value (with actual type int) that can be converted to a char without overflow or underflow. That is, it's really some number, but you usually don't need to know what number, since you generally use a particular char value either as just a number or as just a character, not both.

But this program is using char values in both ways, so it somewhat does matter what the numeric values corresponding to visible characters are. One way it's very often done, but not always, is using the ASCII values which are numbered 0 to 127, or some other scheme which uses those values plus more values outside that range. So for example, if the computer uses one of those schemes, then 'A'==65, and 'A'+1==66, which is 'B'.

This program is assuming that all the lowercase Latin-alphabet letters have numeric values in consecutive order from 'a' to 'z', and all the uppercase Latin-alphabet letters have numeric values in consecutive order from 'A' to 'Z', without caring exactly what those values are. This is true of ASCII, so it will work on many kinds of machines. But there's no guarantee it will always be true!

C does guarantee the ten digit characters from '0' to '9' are in consecutive order, which means that if n is a digit number from zero to nine inclusive, then n + '0' is the character for displaying that digit, and if c is such a digit character, then c - '0' is the number from zero to nine it represents. But that's the only guarantee the C language makes about the values of characters.

For one counter-example, see EBCDIC, which is not in much use now, but was used on some older computers, and C supports it. Its alphabetic characters are arranged in clumps of consecutive letters, but not with all 26 letters of each case all together. So the program would give incorrect results running on such a computer.

score 0 · Answer 4 · answered Feb 20 '19 at 00:08

0

Sequentiality is only one aspect of concern.

Proper use of isalpha(ch) is another, not quite implemented properly in OP's code.

isalpha(ch) expects a ch in the range of unsigned char or EOF. With k[h], a char, that value could be negative. Insure a non-negative value with:

// if  isalpha(k[h])
if isalpha((unsigned char) k[h])

answered Feb 20 '19 at 00:08

chux - Reinstate Monica

143,097
13
135
256

@Broman OP is concern about a niche issue, the sequentiality of `A-Z`, something the is on nil concern given encoding characters of 2019, yet missing a more likely failing issue (UB) of `is...()` functions. – chux - Reinstate Monica Feb 20 '19 at 01:08

Do char's in C have pre-assigned zero indexed values?

4 Answers4