1

I am using the following method in a program used for simple substitution-based encryption. This method is specifically used for removing duplicate characters in the encryption/decryption key.

The method is functional, as is the rest of the program, and it works for 99% of the keys I've tried. However, when I pass it the key "goodmorning" or any key consisting of the same letters in any order (e.g. "dggimnnooor"), it fails. Further, keys containing more characters than "goodmorning" work, as well as keys with less characters.

I ran the executable through lldb with the same arguments and it works. I've cloned my repository on a machine running CentOS, and it works as is.

But I get no warnings or errors on compile.

//setting the key in main method
char * key;
key = removeDuplicates(argv[2]);

//return 1 if char in word
int targetFound(char * charArr, int num, char target){
  int found = 0;

  if(strchr(charArr,target))
    found = 1;

  return found;
}

//remove duplicate chars
char * removeDuplicates(char * word){
  char * result;
  int len = strlen(word);
  result = malloc (len * sizeof(char));
  if (result == NULL)
    errorHandler(2);

  char ch;
  int i;
  int j;
  for( i = 0, j = 0; i < len; i++){
    ch = word[i];
    if(!targetFound(result, i, ch)){
      result[j] = ch;
      j++;
    }
  }

  return result;
}

Per request: if "feather" was passed in to this function the resulting string would be "feathr".

  • 2
    Show the context, how you call it etc. Make a [mcve]. Explain what happens, in which undesired way. Please. – Yunnosch Feb 16 '18 at 06:58
  • And what if there are no duplicates? How will you fit space for the string terminator in the resulting string? Where do you even *add* the terminator? – Some programmer dude Feb 16 '18 at 06:58
  • 1
    Related: https://stackoverflow.com/q/2221304/694576 https://stackoverflow.com/q/13904149/694576 – alk Feb 16 '18 at 07:02
  • @Yunnosch please see edits. –  Feb 16 '18 at 07:05
  • 1
    Please see the link. – Yunnosch Feb 16 '18 at 07:06
  • @Someprogrammerdude not sure if you read the whole description of what's happening. I explicitly stated that this is working code for 99% of the keys that I pass in, however for some reason when I pass "goodmorning" something doesn't work on my machine, but it will work on others. –  Feb 16 '18 at 07:08
  • You should provide the function `targetFound` too. – Marco Bonelli Feb 16 '18 at 07:09
  • @MarcoBonelli done. –  Feb 16 '18 at 07:12
  • 1
    That's the problem with *undefined behavior*, sometimes, maybe even most of the time, it seems to work fine. And then it just breaks without warning. Not putting a terminator in a string will lead to *undefined behavior*. – Some programmer dude Feb 16 '18 at 07:20
  • @Someprogrammerdude I have since included the terminator in my string, and experience the same thing. –  Feb 16 '18 at 07:24
  • Well first of all you should [learn how to debug your programs](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) properly, including being able to step through the code line by line in a debugger. Then after you have done that and still can't figure it out you need to create a [Minimal, **Complete**, and Verifiable Example](http://stackoverflow.com/help/mcve) and show us. And tell us the expected and actual output for some specified input (or better yet use a fixed string that causes problems). – Some programmer dude Feb 16 '18 at 07:32
  • @Someprogrammerdude If you read the post in it's entirety you would have read that I used lldb to debug and passed in "goodmorning" and it worked. But when I pass in "goodmorning" via command line, it doesn't work. –  Feb 16 '18 at 07:34
  • You do know that `argv[2]` is the *second* argument to the program? `argv[0]` is the program "name", `argv[1]` is the first argument, etc. Do you pass more than one argument to the program? You do remember to check `argc` to see what arguments are available? How*do* you run the program? What arguments do you pass? How do you pass them? Without a MCVE and knowing how you run the program it's really hard to do anything more than to *guess*, and often guess badly. – Some programmer dude Feb 16 '18 at 07:44
  • @Someprogrammerdude apologies for the lack of MVCE. Honestly, like I said having traced the problem to the above, I didn't think that more would be necessary. The program is functional other than the case mentioned. Yes I check `argc`, and I'm well aware of the indexing of `argv`. –  Feb 16 '18 at 07:53
  • @alk Thank you. Passing non-terminated string to `strchr` was the issue. –  Feb 16 '18 at 08:09

2 Answers2

4

As R Sahu already said, you are not terminating your string with a NUL character. Now I'm not going to explain why you need to do this, but you always need to terminate your strings with a NUL character, which is '\0'. If you want to know why, head over here for a good explanation. However this is not the only problem with your code.

The main problem is that the function strchr that you are calling to find out if your result already contains some character expects you to pass a NUL terminated string, but your variable is not NUL terminated, because you keep appending characters to it.

To solve your problem, I would suggest you to use a map instead. Map all the characters you already used and if they aren't in the map add them both to the map and the result. This is simpler (no need to call strchr or any other function), faster (no need to scan all the string every time), and most importantly correct.

Here's a simple solution:

char *removeDuplicates(char *word){
    char *result, *map, ch;
    int i, j;

    map = calloc(256, 1);
    if (map == NULL)
        // Maybe you want some other number here?
        errorHandler(2);

    // Add one char for the NUL terminator:
    result = malloc(strlen(word) + 1);
    if (result == NULL)
        errorHandler(2);

    for(i = 0, j = 0; word[i] != '\0'; i++) {
        ch = word[i];

        // Check if you already saw this character:
        if(map[(size_t)ch] == 0) {
            // If not, add it to the map:
            map[(size_t)ch] = 1;

            // And to your result string:
            result[j] = ch;
            j++;
        }
    }

    // Correctly NUL terminate the new string;
    result[j] = '\0';

    return result;
}

Why does this work on other machines, but not on your machine?

You are being a victim of undefined behavior. Different compilers on different systems treat undefined behavior differently. For example, GCC may decide to not do anything in this particular case and make strchr just keep searching in the memory until it founds a '\0' character, and this is exactly what happens. Your program keeps searching for the NUL terminator and never stops because who knows where a '\0' could be in memory after your string? This is both dangerous and incorrect, because the program is not reading inside the memory reserved for it, so for example, another compiler could decide to stop the search there, and give you a correct result. This however is not something to take for granted, and you should always avoid undefined behavior.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • Thank you for being more informative than condescending. –  Feb 16 '18 at 08:00
  • 1
    You can also simply use `0` (zero or octal-constant zero) as the *nul-terminating* character. The ASCII value of `'\0'` is zero, there is just less typing with `0`... – David C. Rankin Feb 16 '18 at 08:08
  • 1
    @DavidC.Rankin yes, less typing, but more confusion. Seeing `0` one might think that `result` is an integer array or something else. Since there's no difference between the two, I prefer clarity over saving three characters in my source code. Same goes for `== NULL` vs `== 0`, etc. – Marco Bonelli Feb 16 '18 at 08:11
1

I see couple of problems in your code:

  1. You are not terminating the output with the null character.
  2. You are not allocating enough memory to hold the null character when there are no duplicate characters in the input.

As a consequence, your program has undefined behavior.

Change

result = malloc (len * sizeof(char));

to

result = malloc (len+1); // No need for sizeof(char)

Add the following before the function returns.

result[j] = '\0';

The other problem, the main one, is that you are using strchr on result, which is not a null terminated string when you call targetFound. That also caused undefined behavior. You need to use:

char * removeDuplicates(char * word){
  char * result;
  int len = strlen(word);
  result = malloc (len+1);
  if (result == NULL)
  {
    errorHandler(2);
  }

  char ch;
  int i;
  int j;

  // Make result an empty string.
  result[0] = '\0';
  for( i = 0, j = 0; i < len; i++){
    ch = word[i];
    if(!targetFound(result, i, ch)){
      result[j] = ch;
      j++;

      // Null terminate again so that next call to targetFound()
      // will work.
      result[j] = '\0';
    }
  }

  return result;
}

A second option is to not use strchr in targetFound. Use num instead and implement the equivalent functionality.

int targetFound(char * charArr, int num, char target)
{
   for ( int i = 0; i < num; ++i )
   {
      if ( charArr[i] == target )
      {
         return 1;
      }
   }
   return 0;
}

That will allow you to avoid assigning the null character to result so many times. You will need to null terminate result only at the end.

char * removeDuplicates(char * word){
  char * result;
  int len = strlen(word);
  result = malloc (len+1);
  if (result == NULL)
  {
    errorHandler(2);
  }

  char ch;
  int i;
  int j;

  for( i = 0, j = 0; i < len; i++){
    ch = word[i];
    if(!targetFound(result, i, ch)){
      result[j] = ch;
      j++;
    }
  }

  result[j] = '\0';
  return result;
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • So, why does it work for all other keys **every single time**? And why does it work "as is" on a machine running CentOS time and time again? –  Feb 16 '18 at 07:13
  • @wanderbread, trying to make sense of undefined behavior is futile. Seemingly sane behavior is included in that too. – R Sahu Feb 16 '18 at 07:15
  • thank you. I initially had mis-read the placement of `result[j]='\0'` –  Feb 16 '18 at 08:05