Lower case to upper case without toupper

Question

Can someone tell me why the line

  s[i]=s[i]-'a'+'A';

does the job of converting lower case to upper case? More specifically, I do not understand why 'a' and 'A' get substituted by the corresponding characters from string s.

string s="Print My Name";

for (int i=0; i<s.length(); i++)
  {
    if (s[i]>='a' && s[i]<='z')
    {
       s[i]=s[i]-'a'+'A';

    }
  }

There are no letters, only numbers. You think that's air you're breathing? — scohe001, Jul 31 '14 at 16:15
Read as `s[i] = s[i] - 97 + 65;` and see the structure of the [ascii table](http://www.asciitable.com/) — DSquare, Jul 31 '14 at 16:16
In the ascii character set, every alphabetic characters are in sequence, both in upper case and lower case (upper cases coming first), hence the difference between any alpha char in upper and lower case is the same constant, which may be calculated by subtracting the first uppercase letter from the first lowercase letter, namely `"A”` and `"a"`. — didierc, Jul 31 '14 at 16:17
Google "ASCII". Look at the chart. How would you generically convert between lower-case and upper-case, without a lookup table? — Hot Licks, Jul 31 '14 at 16:22
Your solution will not work for characters that are not alphabetic, such as space and tabs. — Thomas Matthews, Jul 31 '14 at 16:25
@ThomasMatthews ...hence the if statement before attempting to convert? — scohe001, Jul 31 '14 at 16:26
Thank you for your comments. I know ASCII is a sequence. What confuses me is the following: Let's take i=1, then s[1] is just "r". Why in the line s[1] = s[1] -'a'+'A'; 'a' gets substituted by 'r' and 'A' by 'R'? — user3896430, Jul 31 '14 at 16:34
This does not "do the job". It happens to almost work in a very very very specific case which is almost never what you want to do. Don't use this. — Cat Plus Plus, Jul 31 '14 at 17:31
@user3896430: It doesn't. There is no "substitution". See those `-` and `+`? You're doing _basic arithmetic_. Maths. Addition and subtraction. — Lightness Races in Orbit, Jul 31 '14 at 17:31

score 7 · Answer 1 · answered Jul 31 '14 at 17:36

7

does the job of converting lower case to upper case?

It doesn't. Try passing something like "naïve" in. The C and C++ Standards do not specify any genuinely useful string manipulation functions, although some implementations extend them to be more useful.

If you want string handling functions that actually work, albeit with an interface less friendly than a primed nuclear warhead, you can look at ICU.

answered Jul 31 '14 at 17:36

Puppy

144,682
38
256
465

1

OP has the `s[i]>='a' && s[i]<='z'` condition in his code, so the statement "it does not work" clearly does not apply to the code from the question. – Sergey Kalinichenko Jul 31 '14 at 17:50
To expand: toupper/tolower are still broken, because they assume these conversions are 1-to-1 which is not true. – Cat Plus Plus Jul 31 '14 at 17:50
2

@dasblinkenlight Yes, it does, because the question is about converting case not whether the code compiles/runs/whatever. – Cat Plus Plus Jul 31 '14 at 17:52
@dasblinkenlight: Is that how you test your functions? Pass it one or two inputs, slap yourself on the back and say "yeah bro, it works!"? – Lightness Races in Orbit Jul 31 '14 at 17:56
1

@LightnessRacesinOrbit There is a condition in OP's code that sets the limits to where the solution applies. The code says "for character codes between lowercase `'a'` to lowercase `'z'`, do this". It does not say "for character codes that correspond to lowercase letters do this". – Sergey Kalinichenko Jul 31 '14 at 18:01
3

@CatPlusPlus The title of the question is more general than the code in the question. OP's code works for the 26 characters to which it is meant to apply. – Sergey Kalinichenko Jul 31 '14 at 18:02
What ludicrous logic. The function is _obviously_ flawed, the OP _obviously_ has a false restriction on inputs without realising the consequences, and you lot are encouraging this. – Lightness Races in Orbit Jul 31 '14 at 18:20
He does not state that the condition is part of his requirements or limiting the scope, his current code simply happens to have it. Working for the 26 characters to which it is meant to apply is pretty meaningless. It's like solving 3SAT in P-time, for the case where N is less than 2. – Puppy Jul 31 '14 at 18:37
@LightnessRacesinOrbit Don't you find calling restrictions in OP's code "obviously false" a little presumptuous? What does explaining the behavior of OP's code within the limitations of OP's code have to do with "encouraging" anything? – Sergey Kalinichenko Jul 31 '14 at 18:48
@dasblinkenlight: [Happy reading](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). – Lightness Races in Orbit Jul 31 '14 at 18:55

Wojtek Surowka · Answer 2 · 2014-07-31T19:08:07.630

2

The expression

s[i]=s[i]-'a'+'A'

in C++ (and C as well) means

s[i]=s[i]-<code of character a>+<code of character A>

this, together with the assumption that all lowercase letter are consecutive, and all uppercase letters are consecutive makes it working.

Of course normally the assumptions above are valid for English characters only.

edited Jul 31 '14 at 19:08

answered Jul 31 '14 at 16:16

Wojtek Surowka

20,535
4
44
51

Does that include lowercase Turkish "i"? – Puppy Jul 31 '14 at 17:47

score -1 · Answer 3 · answered Jul 31 '14 at 16:16

-1

The expression:

s[i]-'a'

Returns the zero based position of the character within the alphabet. Then adding 'A' adds that position to upper case 'A' given the upper case equivilant of s[i].

answered Jul 31 '14 at 16:16

Sean

60,939
11
97
136

*Returns the zero based position of the character within the alphabet* What is the "zero based position" of `ñ`? – scohe001 Jul 31 '14 at 18:55
1

Thanks Sean. Basically, 'A'-'a' is just a shift :-) – user3896430 Jul 31 '14 at 19:51
@Josh - try reading the question. The range of characters is restricted to between `a` and `z` before the subtraction takes place, so `ñ` would never be considered. – Sean Aug 01 '14 at 07:47
@Sean easy there, I'm not the one who downvoted, I'm just pointing out that you might want to specify that this'll only hold for about 26 characters – scohe001 Aug 01 '14 at 15:50

Lower case to upper case without toupper

3 Answers3

Linked