1

Below are my codes that convert large letters to small letters and vice versa.

#if SOL_2
        char ch;
        char diff = 'A' - 'a';
        //int diff = 'A' - 'a';
        fputs("input your string : ", stdout);

        while ((ch = getchar()) != '\n') {
            if (ch >= 'a' && ch <= 'z') {
                ch += diff;
            }
            else if (ch >= 'A' && ch <= 'Z') {
                ch -= diff;
            }
            else {}

            printf("%c", ch);
        }
#endif

Above codes, instead of char diff = 'A' - 'a', I used the int = 'A' -'a' and the result was same. Therefore, I thought that using character can save memory since char is one byte but int is four bytes. I can't think other advantages of it. I would appreciate it if you let me know other advantages of it.

And What is the main reason of using char in order to store character values? It is because of just memory size problem?

mnille
  • 1,328
  • 4
  • 16
  • 20
sclee1
  • 1,095
  • 1
  • 15
  • 36
  • 4
    '*Result was same*' does not necessarily mean that you are correct. – abhishek_naik Jul 28 '16 at 12:43
  • The result was the same because when you store a small value in a larger space (character in an int), the bits remain the same. You just have a bunch of leading 0s if it's in an int. In cases where you perform calculations on variables and you store them as smaller types (like storing an int in a char), you may run the risk of exceeding 8 bits. Even if this is very brief, you run a great risk of compromising data. – MacedonZero Jul 28 '16 at 12:53
  • 2
    Using `char` for arithmetics is **always** a bad idea. – too honest for this site Jul 28 '16 at 13:10
  • "Don't tell me there's not one bit of difference between uppercase and lowercase, because that's exactly what there is." `if (ch >= 'A' && ch <= 'Z' || ch >= 'a' && ch <= 'z') ch ^= 0x60;` – ikegami Jul 28 '16 at 13:12
  • @ikegami: magic numbers and fixed to ASCII - very bad! Does not even work with ISO8859-1 – too honest for this site Jul 28 '16 at 13:14
  • 1
    Also literals such as `'A'` are of type `int` themselves. – Jens Gustedt Jul 28 '16 at 13:16
  • 1
    @ikegami, no the OP implicitly assumed that lower case and upper case letters each are contiguous. Otherwise there is nothing that assumes ASCII. – Jens Gustedt Jul 28 '16 at 13:19
  • @Jens Gustedt, And that's only the case for ASCII. But I'll rephrase... – ikegami Jul 28 '16 at 13:44
  • @Olaf, Neither does the OP's. What I posted works for all characters sets for which the OP's version works. I didn't add any assumptions. True, what I posted shouldn't be used in the modern world, but neither should the OP's. – ikegami Jul 28 '16 at 13:48
  • @ikegami: No. Your code relies on bit 6 being the "uppercase bit". OP uses the difference, which does not presume a specific code. – too honest for this site Jul 28 '16 at 13:48
  • @Olaf, No, you are mistaken. The only character set for which the OP's code works is for ASCII. Both of our programs make the same assumption. – ikegami Jul 28 '16 at 13:49
  • @ikegami: It works for any character set with letters in adjacent, ascending order and with codes less than 128, even if the lower case letters preceed the uppercase or not just differentiated by a single bit. Your code relies on bit 6 ... (I wrote that already) – too honest for this site Jul 28 '16 at 13:53
  • @Olaf, I hear what you're saying, but it's wrong. My solution works everywhere the OP's solution works. If you think I'm wrong, prove it: Name one character set where his works and mine doesn't. – ikegami Jul 28 '16 at 13:54
  • @ikegami: I clearly stated a scenario. COnsider one where the uppercase letters instantly follow the lowercase letters. Note there are quite some non-standardised encodings, e.g. in embedded devices which are designed to conserve memory footprint. – too honest for this site Jul 28 '16 at 13:55
  • @Olaf, Oh yeah? Like what? you just gotta name one to prove me wrong. – ikegami Jul 28 '16 at 13:56
  • @ikegami: Ok: Olaf1 (on my old HC11 board). I used a manual translation preprocessor. Prove me wrong! – too honest for this site Jul 28 '16 at 13:57
  • @Olaf, And the OP's doesn't work on Éric1. Funny that. (or EBCIDIC, or UTF-16be, UTF-16le, UTF-32le, UTF-32be, UCS-2le, UCS-2be, UCS-4le, UCS-4be, etc) You're spending so much time creating imaginary problems when you couldn't be pointing out that the OP's code doesn't work with Unicode! It can't even lowercase my name (Éric) – ikegami Jul 28 '16 at 14:02
  • @ikegami Certainly you meant `ch ^= 0x20;` and not `ch ^= 0x60;` in your [comment](http://stackoverflow.com/questions/38636709/what-is-most-advantage-of-using-char-instead-of-int/38639130#comment64657819_38636709). – chux - Reinstate Monica Jul 28 '16 at 14:27
  • @toohonestforthissite Why is using `char` for arithmetics is always a bad idea? – HelloGoodbye May 27 '23 at 23:19

4 Answers4

8

You should be using int ch and int diff.

  • getchar() returns int, not char. Therefore ch needs to be int. This is so you can tell the difference between end-of-file and character 0xff, both of which would be -1 in a signed byte. (reference)
  • char might be signed or unsigned (see this answer). Therefore, you should use int for comparisons so that you know you have room for negative values (int is signed by default).

To answer your specific question, use char when you know you have byte data and, yes, you'll most likely save some memory. Another reason to use char (or wchar_t or other character types) is to make it clear to the reader of your code that you intend this data to be text and not numeric, if indeed that is the case. Another use case for char is to access individual bytes of a file or other data stream.

Community
  • 1
  • 1
cxw
  • 16,685
  • 2
  • 45
  • 81
  • @Olaf, Re "*An the signed-ness of char is implementation defined, not always signed*". That's what cxw said. – ikegami Jul 28 '16 at 13:14
  • @Olaf the second bullet says "char _might be signed or unsigned_ (see this answer). Therefore, you should use int for comparisons so that you know you have room for negative values (int is signed by default).". – iRove Jul 28 '16 at 13:21
  • 1
    `EOF` is never `0xff`, because `EOF` must be negative and `0xff` is always the value `255`. When converted to a `char` that by coincidence is and unsigned integer type, might end up to be `255` after that conversion. – Jens Gustedt Jul 28 '16 at 13:24
  • @iRove: My bad, sorry. Thanks for the correction. The false presumption about encoding still remains. There is no specific encoding of signed integers enforced by the standard. However, the first bullet is wrong in that it assumes there actually can be a `char` with value `0xff`. That is only possible if a byte has more than 8 bits or `char` is unsigned`. – too honest for this site Jul 28 '16 at 13:35
1

What is the main reason of using char in order to store character values? It is because of just memory size problem?

The primary use of using char vs. int with arrays and sequences of characters is space (and processing speed on machines with wide architectures). If code uses characters limited to an 8-bit range, excessively large data types slow things down.

With single instances of a type, int is often better as that is typically the "native" type that the processor is optimized for.

Yet optimizing for a single char vs int (assuming both work in the application) is usually not a fruitful use of your time. Worry about larger issues and let the compiler optimize the small stuff.


Note that int getchar() returns values in the range of unsigned char and EOF. These typically 257 different values cannot be store distinctly in a char. Use an int


C provides isupper(), islower(), toupper(), tolower() and is the robust method to handle simple character case conversion.

if (isupper(ch)) ch = tolower(ch);

Example usage:

int ch;   
while ((ch = getchar()) != '\n' && ch != EOF) {
  if (isupper(ch)) {
    ch = tolower(ch);
  }
  else if (islower(ch)) {
    ch = toupper(ch);
  }
  printf("%c", ch);
}
fflush(stdout);

With ASCII, EBCDIC and every small character encoding I've encounterd, A-Z case conversion can be done by simple toggling a bit. Notice no magic numbers.

ch ^= 'A' ^ 'a';

Example usage:

int ch;   
while ((ch = getchar()) != '\n' && ch != EOF) {
  if (isalpha(ch)) {
    ch ^= 'A' ^ 'a';
  }
  printf("%c", ch);
}
fflush(stdout);
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

Yes, you pointed out correctly the character the we use in char are nothing but binary code of 1 byte i.e 256 number each number in binary represent a number mapping to a character (might be not all binary number represent different character it depends which encoding you use) refer unicode encoding , don't just considering only english language consider other language characters as well like chinesse or hindi... and so on .So each character in this language needs to be represented by a number which is standardise by unicode

so the point is when you use char of java it only contains a subset of only english language alphabets however when you develop a international software which has ability to choose across different languages to display you should use int rather . However if your scope is only english language char would be the best choice as when you use int it consumes more bits that are unused bit which are been padded off with zero this are just extra bits with no significance to match the length of a int

suppose you have a text in chinesse language opened in editor like notepad and if the character encoding is set to ASCII as ascii has a small charset that is only english A-Z, a-z, 0-9 , space , newline ... like 256 odd characters, you will see wired characters in the file just like a binary file to see the actually content of file you need to change encoding to UTF-8 which uses unicode charset , and now you can see the text

K patel
  • 54
  • 1
  • 5
0

Plase read Standard 6.3.1.8 Usual arithmetic conversions and 6.3.1.1 Boolean, characters, and integers.

If an int can represent all values of the original type [...] the value is converted to an int;

In

char c1 = 'A', c2 = 'Z';
c2 - c1;                   // expression without side effects

the expression above, both x and y are converted to int before the subtraction is performed.

pmg
  • 106,608
  • 13
  • 126
  • 198