1

I'm trying to code a simple Caesar Cipher in C. I'm creating an encrypt function that receives a string (char *, the text to be encrypted) and an integer (the key).

In the function, I allocate memory for an empty string that will receive the shifted chars. Then, I iterate through every char in the initial string and ask if it is a letter char (A-z) or not. If it is, it shiftes according to key. If it is not, it simply repeats the current char. The problem is: when chars such as !, ?, . or even space appear at the end, it adds some '?' to it. I have put printf statements and my guess is that undefined behavior is occuring, but I can't figure it out myself. I hope someone can help me. Below, it is the code I wrote and the strange results.

char* encrypt(char* entry, int key) {
    int i = 0;
    key = key % 26;
    char * tmp = (char *)malloc(strlen(entry));
    if (!tmp) {
        printf("Error during allocation.\n");
        return entry;
    }
    //memset(tmp, 0, 1); // Tried with and without it.
    char t;
    while ((t = *(entry + i))) {
        printf("Current letter: %c\n",*(entry+i));
        if ((t >= 65 && t <= 90) || (t >= 97 && t <= 122)) { //is letter 
            *(tmp + i) = t + key > 90 ? t + key - 26 : t + key;
        }
        else { //isnt letter
            printf("No letter char appeared. Code = %d\n",t); 
            *(tmp+i) = t;
        }
        printf("tmp letter: %c\n",*(tmp+i));
        printf("current tmp: %s\n----------------\n",tmp);
        i++;
    }
    printf("final tmp = %s\n",tmp);
    entry = tmp;
    free(tmp);
    return entry;
}

Calling the function: encrypt("HELLO! HOW ARE YOU?!", 13);

Expected (final) result: URYYB! UBJ NER LBH?! Actually (final) result: URYYB! UBJ NER LBH?!? (sometimes it adds more '?')

Debugging printf statements:

Current letter: H
tmp letter: U
current tmp: U
----------------
Current letter: E
tmp letter: R
current tmp: UR
----------------
...
----------------
Current letter: !
No letter char appeared. Code = 33
tmp letter: !
current tmp: URYYB!
----------------
Current letter:  
No letter char appeared. Code = 32
tmp letter:  
current tmp: URYYB! ? // <<< It added a strange character to the string
----------------
Current letter: H
tmp letter: U
current tmp: URYYB! U // <<< '?' strange character gone
----------------
Current letter: O
tmp letter: B
current tmp: URYYB! UB
----------------
Current letter: W
tmp letter: J
current tmp: URYYB! UBJ // (I)
----------------
Current letter:  
No letter char appeared. Code = 32
tmp letter:  
current tmp: URYYB! UBJ // This time, space didn't raise a strange char after (I)
----------------
...
----------------
Current letter:  
No letter char appeared. Code = 32
tmp letter:  
current tmp: URYYB! UBJ NER ? // Missed me? I'm back
----------------
Current letter: Y
tmp letter: L
current tmp: URYYB! UBJ NER L // ...And gone again
----------------
Current letter: O
tmp letter: B
current tmp: URYYB! UBJ NER LB
----------------
Current letter: U
tmp letter: H
current tmp: URYYB! UBJ NER LBH???
----------------
Current letter: ?
No letter char appeared. Code = 63
tmp letter: ?
current tmp: URYYB! UBJ NER LBH???
----------------
Current letter: !
No letter char appeared. Code = 33
tmp letter: !
current tmp: URYYB! UBJ NER LBH?!?
----------------
final tmp = URYYB! UBJ NER LBH?!?

Does anyone have an explanation to this happening? GCC info on my sys:

gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Victor Oliveira
  • 352
  • 2
  • 10
  • 1
    Unrelated to this problem, but you’re returning a pointer after you `free` it. You’re not allowed to use the memory it points to after freeing it. – Sami Kuhmonen Apr 12 '19 at 20:23
  • @xing This indeed fixed my issue! Could you give a deeper explanation? Thank you :) – Victor Oliveira Apr 12 '19 at 20:28
  • @SamiKuhmonen I firstly passed ```tmp``` content to the existing ```entry``` string. Just after that I ```free``` it. I believe entry points to wherever tmp is pointing, not to tmp itself. If I didn't ```free``` it, wouldn't it cause memory leak? Thanks! – Victor Oliveira Apr 12 '19 at 20:30
  • 2
    You set entry to point to the same place as tmp and then free it. Both `tmp` and `entry` point to freed memory and you’re not allowed to use them. Free the memory *after* you don’t use it – Sami Kuhmonen Apr 12 '19 at 20:32
  • @SamiKuhmonen True. Checked ```&(*entry)``` and ```&(*tmp)``` and they both have the same address. However, even using ```free``` there, it returned the result and I was able to access it inside my main function. Can you elaborate on the reason? – Victor Oliveira Apr 12 '19 at 20:40
  • 1
    If nothing else uses the memory the contents *may* be accessible and the same. But it is not guaranteed and is undefined behavior. Anything may happen, including the data being there. Or a crash. Or random data. Anything. – Sami Kuhmonen Apr 12 '19 at 20:44
  • @SamiKuhmonen Oh, yeah. Sorry for bothering here in the comments, but I totally forgot about it. The data is there because the address was not filled with something else yet. I forgot this concept. By the way, thank you a lot for the patience and detailed info. Could you explain me why xing's code solved my problem? I thought once I called ```strlen(entry)``` it would already take the space for ```\0``` into account. I understand why I have to manually put the terminator char after the while, but not for the extra byte. – Victor Oliveira Apr 12 '19 at 20:52
  • 1
    It doesn’t account for the null, it only gives the amount of characters in the string. Terminator has to be added always manually since it’s not always needed. For example if you write the string to file you want to know how many characters to write, not how many plus one. – Sami Kuhmonen Apr 12 '19 at 20:53
  • Understood! Thank you again – Victor Oliveira Apr 12 '19 at 21:02
  • @VictorOliveira: You might find [this answer](https://stackoverflow.com/a/6445794) with a metaphor about a drawer in a hotel room enlightening. – Daniel Pryden Apr 12 '19 at 22:19
  • Possible duplicate of [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior) – Daniel Pryden Apr 12 '19 at 22:23

1 Answers1

0

tldr; you are printing strings from memory which was never initialized. Try to use calloc instead of malloc.

I think what is going on in your example is that you are missing zero bytes to terminate your strings. In order to understand what is going on you have to consider these two things:

  1. C strings are terminated by zero bytes (\0). Functions which deal with C strings always expect a zero byte at the end of the string. If there is no such terminator, they will just assume that your string has not ended yet.
  2. malloc does not initialize the allocated memory. This means printf("%s", malloc(10)); might print something or not. This depends on a big number of factors, hence it is normally referred as "undefined behavior".

So coming back to your case: You allocate tmp but never initialize it with zeros. But this is needed for your print statements to work. So use something like calloc(1, strlen(entry)+1). (Please note the +1. You need one more byte for the string terminator (\0))

In order for you to understand what is going on I would suggest you add a memset(tmp, 'X', strlen(entry)); after your malloc-line. And then try to understand the output.


Side note: the following does not copy the string from tmp to entry.

entry = tmp;
free(tmp);
return entry;

You basically just return tmp here which points to a free'd block of memory and will lead to invalid memory access outside the function. What you want to use is memcpy.

Ente
  • 2,301
  • 1
  • 16
  • 34