Converting Letters to Numbers in C

Question

I'm trying to write a code that would convert letters into numbers. For example A ==> 0 B ==> 1 C ==> 2 and so on. Im thinking of writing 26 if statements. I'm wondering if there's a better way to do this...

Thank you!

Just a quick note to all those "num = letter - 'A'" crowd. The C99 standard requires that the digit characters ('0'-'9') are consecutive but *not* the letter characters: "In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.". EBCDIC (with its weird disjointed alphabet) is perfectly valid. That means @ChrisLutz has the only correct answer to date, despite his misgivings about it :-) — paxdiablo, Sep 24 '09 at 05:05
ISO should have mandated ASCII (or at least sequential letters) but I suspect IBM had a big part to play in keeping their mainframe C compilers conformant. — paxdiablo, Sep 24 '09 at 05:06
Raise your hand if you have, do, or will ever develop for an EBCDIC machine. — Crashworks, Sep 24 '09 at 05:10
Raising my hand (sheepishly) :-) You'd be surprised how many mainframes there are out there. — paxdiablo, Sep 24 '09 at 05:13
@Pax - I'm more than willing to stand by my misgivings, even in the face of the entire EBCDIC world. And I agree that ASCII should have been mandated. — Chris Lutz, Sep 24 '09 at 05:13
In any case, it doesn't *matter* how many people do it. The standard does not require consecutive letters so implementors are free to do what they wish. People who code to the ASCII standard are seriously limiting their potential market to only about 99.999% of the computers out there :-) — paxdiablo, Sep 24 '09 at 05:15
ASCII certainly should *not* have been mandated, any more than a particular floating point format should have been. If you want to write x86 assembly, write x86 assembly. — caf, Sep 24 '09 at 05:40
If this is really school homework, you should care if your teacher worries or even knows about the C99 standard issues. Otherwise, he could give you a worse grade just because you don't use the "cleaner" approach (i.e., letter - 'A'), and arguing about C99 standards won't be enough to convince him. — djeidot, Sep 24 '09 at 14:19
@Greg - If we get into those letters we're stepping very quickly outside the bounds of standard C, which we've been arguing rather heatedly about in the comments for quite some time. — Chris Lutz, Sep 24 '09 at 19:03
@paxdiablo Hey, I posted what I think is the real answer to this very old, but never properly answered, question. Would appreciate some support to move the answer closer to the top, or a comment explaining that I'm out of my mind and completely wrong :) Edit: it's the answer with all the yellow in it. — user3386109, Feb 26 '16 at 03:22

score 11 · Answer 1 · answered Sep 24 '09 at 05:45

11

This is a way that I feel is better than the switch method, and yet is standards compliant (does not assume ASCII):

#include <string.h>
#include <ctype.h>

/* returns -1 if c is not an alphabetic character */
int c_to_n(char c)
{
    int n = -1;
    static const char * const alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char *p = strchr(alphabet, toupper((unsigned char)c));

    if (p)
    {
        n = p - alphabet;
    }

    return n;
}

answered Sep 24 '09 at 05:45

caf

233,326
40
323
462

1

For full standards compliance you might want to cast `p - alphabet` before assigning it. You might use a `ptrdiff_t` or some other technically correct type, but given the range limitations I don't think it's really necessary. Any integral type is guaranteed to be able to hold any of the values we're using here. – Chris Lutz Sep 24 '09 at 18:58
Yes, in this case we can guarantee that `p - alphabet` is in the range 0...25, so it will definitely fit into an `int`. I don't believe a cast there is necessary - the semantics of assigning one integral type to another are quite well defined. – caf Sep 24 '09 at 22:59
3

I'll give you a vote for that one @caf, since it handles all characters sets within the standard. It's also one of the rare times I've seen someone use the const const properly for pointer and pointee :-) Of course, being an old-timer, I would've just done: 'return p ? (int)(p - alphabet) : -1;' instead of all that mucking about with n and if statements. – paxdiablo Nov 05 '09 at 02:41

score 10 · Accepted Answer · answered Sep 24 '09 at 04:12

10

If you need to deal with upper-case and lower-case then you may want to do something like:

if (letter >= 'A' && letter <= 'Z')
  num = letter - 'A';
else if (letter >= 'a' && letter <= 'z')
  num = letter - 'a';

If you want to display these, then you will want to convert the number into an ascii value by adding a '0' to it:

  asciinumber = num + '0';

answered Sep 24 '09 at 04:12

James Black

41,583
10
86
166

4

Alternatively, use `num = toupper(letter) - 'A'` to convert the letter to uppercase, thus avoiding the conditional. The `toupper()` function is found in the `ctype.h` header. – Chris Lutz Sep 24 '09 at 04:14
We can also note that lowercase letters are just an `0x20` difference from uppercase. – Noon Silk Sep 24 '09 at 04:14
True, but by having a conditional, if you need to differentiate somehow you can, but there are various options, I just wanted to point out that upper and lower-case may be an issue and should be handled. – James Black Sep 24 '09 at 04:16
Indeed. I would put this code in a function, and add an `else num = -1` at the end just for safety, but it doesn't matter. You can check that the return value of `toupper() - 'A'` is within the desired range (0 - 25) just as easily. – Chris Lutz Sep 24 '09 at 04:20
A better way to display the number is just to use `printf()` (or `sprintf()` if you need to work with it as a string). – Chris Lutz Sep 24 '09 at 04:34
1

Note that the "asciinumber = num + '0';" bit only works for single digits. – Tal Pressman Sep 24 '09 at 04:45
You are correct, I didn't take into account that asciinumber is flawed. – James Black Sep 24 '09 at 04:49

user3386109 · Answer 3 · 2016-09-27T17:26:46.353

The C standard does not guarantee that the characters of the alphabet will be numbered sequentially. Hence, portable code cannot assume, for example, that 'B'-'A' is equal to 1.

The relevant section of the C specification is section 5.2.1 which describes the character sets:

3 Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet
    ABCDEFGHIJKLM   
    NOPQRSTUVWXYZ
the 26 lowercase letters of the Latin alphabet
    abcdefghijklm
    nopqrstuvwxyz
the 10 decimal digits
    0123456789
the following 29 graphic characters
    !"#%&'()*+,-./: 
    ;<=>?[\]^_{|}~ 
the space character, and control characters representing horizontal tab, vertical tab, and form feed. The representation of each member of the source and execution basic character sets shall fit in a byte. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

So the specification only guarantees that the digits will have sequential encodings. There is absolutely no restriction on how the alphabetic characters are encoded.

Fortunately, there is an easy and efficient way to convert A to 0, B to 1, etc. Here's the code

char letter = 'E';                  // could be any upper or lower case letter
char str[2] = { letter };           // make a string out of the letter
int num = strtol( str, NULL, 36 ) - 10;  // convert the letter to a number

The reason this works can be found in the man page for strtol which states:

(In bases above 10, the letter 'A' in either upper or lower case represents 10, 'B' represents 11, and so forth, with 'Z' representing 35.)

So passing 36 to strtol as the base tells strtol to convert 'A' or 'a' to 10, 'B' or 'b' to 11, and so on. All you need to do is subtract 10 to get the final answer.

score 5 · Answer 4 · edited Sep 27 '16 at 17:12

5

Another, far worse (but still better than 26 if statements) alternative is to use switch/case:

switch(letter)
{
case 'A':
case 'a': // don't use this line if you want only capital letters
    num = 0;
    break;
case 'B':
case 'b': // same as above about 'a'
    num = 1;
    break;
/* and so on and so on */
default:
    fprintf(stderr, "WTF?\n");
}

Consider this only if there is absolutely no relationship between the letter and its code. Since there is a clear sequential relationship between the letter and the code in your case, using this is rather silly and going to be awful to maintain, but if you had to encode random characters to random values, this would be the way to avoid writing a zillion if()/else if()/else if()/else statements.

edited Sep 27 '16 at 17:12

Toby Speight

27,591
48
66
103

answered Sep 24 '09 at 04:26

Chris Lutz

73,191
16
130
183

2

This is *not* so silly. Despite your comment elsewhere, @Chris, C99 only mandates that the numeric characters are in order. Alphas can be all over the place (such as EBCDIC with its two different areas). This is, in fact, the only correct answer to date. + 1. – paxdiablo Sep 24 '09 at 05:04
1

Ah. I'm all over the road today. I did know that the digits were in order, I just made a leap about the characters. I really need to read the C standard. I have to say, though, if this is the price of correctness, I'm willing to say "To hell!" with EBCDIC. – Chris Lutz Sep 24 '09 at 05:11
It's good for everyone to know that the order is not guaranteed, but seriously, you have to take your audience into consideration. If this program is going to be used by people running on any 'standard' computer it is safe to use "letter - 'A'" – Ed S. Sep 24 '09 at 06:25
1

@Ed: *My* audience (visuance?) consists of people who know and follow the standard (and that's standard *without* quotes). Your program wouldn't conform with the standard. That's fine - I understand that the vast majority of C environments use ASCII or ISO646 but I consider it slightly arrogant to state that that's all that matters. ISO left open the possibility for non-contiguous letters for a good reason - do you really think you know better than them? I don't want to get into a p*ssing match, just putting my viewpoint forward - we may just have to agree to disagree. – paxdiablo Sep 24 '09 at 07:43

score 4 · Answer 5 · answered Sep 24 '09 at 04:07

4

There is a much better way.

In ASCII (www.asciitable.com) you can know the numerical values of these characters.

'A' is 0x41.

So you can simply minus 0x41 from them, to get the numbers. I don't know c very well, but something like:

int num = 'A' - 0x41;

should work.

answered Sep 24 '09 at 04:07

Noon Silk

54,084
6
88
105

Not sure if you posted that before you saw my edit; I've corrected myself. I always remember it as `41`, I guess I think in hex :P – Noon Silk Sep 24 '09 at 04:11
4

also: `int num = letter - 'A';` – Nick Dandoulakis Sep 24 '09 at 04:11
I was about to answer similarly but I think there is no need to. Just want to add that you can do this: int '`num = aChar - 'A'`'; – NawaMan Sep 24 '09 at 04:11
2

It's more common to use `int num = 'A' - 'A'` (replacing the first one with the character or variable in question) _just in case_ we're not using ASCII, though I think that might be guaranteed by the standard. I know that the standard guarantees that 'A' .. 'Z' are consecutive in the character set, though. – Chris Lutz Sep 24 '09 at 04:13
1

I prefer to use 'A' as it improves readability, otherwise someone has to look up 0x41 and see what it is. :) – James Black Sep 24 '09 at 04:16
@Chris: it doesn't matter if you know you are using ASCII or not. Putting magic numbers (i.e. literal constants) in code is a bad practice and should almost always be avoided. – Jeanne Pindar Sep 24 '09 at 05:08
Jeanne: This is homework, not a real application. Context matters. – Noon Silk Sep 24 '09 at 05:25
8

Chris: 'A' to 'Z' being consecutive *isn't* guaranteed by the standard (only '0' to '9'). – caf Sep 24 '09 at 05:47
@caf: It *is* guaranteed by the **ASCII** standard. It's just that the **C** standard doesn't guarantee that C strings will be ASCII! – Daniel Pryden Sep 24 '09 at 05:51
3

@silky - No, it doesn't. Using `0x41` instead of `'A'` is silly. Why don't we all write our numbers and strings directly in binary? Why not calculate our own jumps and pointer arithmetic? @caf - This has been pointed out in various places, but I'm having a rather rough day mentally so it can't hurt to remind me. ;) But yes, I was extrapolating from '0' - '9' being consecutive to 'A' - 'Z', and while not true, it's a fairly safe assumption unless you plan on writing code for mainframes. Doesn't change the fact that `- 'A'` is better than `- 0x41` in almost all situations. – Chris Lutz Sep 24 '09 at 05:54
Chris: I disagree with you and will, but will not continue a useless discussion. – Noon Silk Sep 24 '09 at 05:59
Chris: Yep I saw that, just thought it should be recorded in this comment thread for posterity ;). Daniel: It's pretty clear that "the standard" in the context I was replying to meant the C standard. – caf Sep 24 '09 at 06:14

score 0 · Answer 6 · answered Sep 24 '09 at 04:36

In most programming and scripting languages there is a means to get the "ordinal" value of any character. (Think of it as an offset from the beginning of the character set).

Thus you can usually do something like:

for ch in somestring:
    if lowercase(ch):
        n = ord(ch) - ord ('a')
    elif uppercase(ch):
        n = ord(ch) - ord('A')
    else:
        n = -1  # Sentinel error value
        # (or raise an exception as appropriate to your programming
        #  environment and to the assignment specification)

Of course this wouldn't work for an EBCDIC based system (and might not work for some other exotic character sets). I suppose a reasonable sanity check would be to test of this function returned monotonically increasing values in the range 0..26 for the strings "abc...xzy" and "ABC...XYZ").

A whole different approach would be to create an associative array (dictionary, table, hash) of your letters and their values (one or two simple loops). Then use that. (Most modern programming languages include support for associative arrays.

Naturally I'm not "doing your homework." You'll have to do that for yourself. I'm simply explaining that those are the obvious approaches that would be used by any professional programmer. (Okay, an assembly language hack might also just mask out one bit for each byte, too).

Most of this information doesn't apply to C. Questions have language tags for a reason. — Chris Lutz, Sep 24 '09 at 04:51
@Chris: I guess we just have to agree to disagree. I find the pseudo code approach very good. If nothing else, it will force students to look up the syntax they have already been told about - and having to look up something you should already know is a damn good way to learn it. Also, for many students not familiar with C-style syntax, C's `for` loops are rather confusion: Only one keyword, but lots of operators and separators, and which of the expressions in between those does which you just have to know. — sbi, Sep 24 '09 at 18:12

score -1 · Answer 7 · edited Sep 24 '09 at 04:48

-1

Since the char data type is treated similar to an int data type in C and C++, you could go with some thing like:

char c = 'A';   // just some character

int urValue = c - 65;

If you are worried about case senstivity:

#include <ctype.h> // if using C++ #include <cctype>
int urValue = toupper(c) - 65;

edited Sep 24 '09 at 04:48

sbi

219,715
46
258
445

answered Sep 24 '09 at 04:40

147

582
8
20

score -1 · Answer 8 · answered Sep 24 '09 at 06:19

Aww if you had C++

For unicode definition of how to map characters to values

typedef std::map<wchar_t, int> WCharValueMap;
WCharValueMap myConversion = fillMap();

WCharValueMap fillMap() {
  WCharValueMap result;
  result[L'A']=0;
  result[L'Â']=0;
  result[L'B']=1;
  result[L'C']=2;
  return result;
}

usage

int value = myConversion[L'Â'];

score -1 · Answer 9 · answered Dec 01 '15 at 21:04

I wrote this bit of code for a project, and I was wondering how naive this approach was.

The benefit here is that is seems to be adherent to the standard, and my guess is that the runtime is approx. O(k) where k is the size of the alphabet.

int ctoi(char c)
{
    int index;
    char* alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    c = toupper(c);

    // avoid doing strlen here to juice some efficiency.
    for(index = 0; index != 26; index++)
    {
        if(c == alphabet[index])
        {
            return index;
        }
    }

    return -1;
}

Or you could reduce that code to a couple of lines, using `strchr()` :-) — paxdiablo, Feb 26 '16 at 03:45

score -1 · Answer 10 · answered Sep 27 '16 at 15:49

-1

#include<stdio.h>
#include<ctype.h>
int val(char a);
int main()
{
    char r;
    scanf("%c",&r);
    printf("\n%d\n",val(r));
}
int val(char a)
{
    int i=0;
    char k;
    for(k='A';k<=toupper(a);k++)
    i++;
    return i;
}//enter code here

answered Sep 27 '16 at 15:49

Srinithya Vutukuru

1

Welcome to Stack Overflow! Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term educational value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Sep 27 '16 at 17:10

Converting Letters to Numbers in C

10 Answers10

Linked

Related