-1

I've tried different ways to assign a string "cypher" using function "cypher_text()". I pass two strings to funtion cypher_text(), and try to pass a string back. I'm not sure if I'm trying to pass a string or an array of chars back. But I thought they were essentially the same, as in a string is an array of chars.

I can complete the assignment by just printing the values in the function - but I want to understand why I can't return the string from the cypher_text() function back to the main().

#include <stdio.h>
#include <cs50.h>
#include <string.h>
#include <ctype.h>

string cypher_text(string text, string key);

int main(int argc, string argv[])
{
    string cypher;

    if (argc != 2)
    {
        printf("Missing Key\n");
        return 1;
    }
    if (strlen(argv[1]) != 26)
    {
        printf("Invalid Key - must be 26\n");
        return 1;
    }

    for (int i=0,n=strlen(argv[1]); i<n; i++)
    {
        if (isalpha(argv[1][i])==0)
        {
            printf("Invalid key must have only letters\n");
            return 1;
        }
        //for char[0] check if equals to char[1], char[2]....
        for (int j=i+1,m=strlen(argv[1]); j<m; j++)
        {
            if (argv[1][i] == argv[1][j])
            {
                printf("Invalid key duplicates\n");
                return 1;
            }
        }
    }
    
    string plain = get_string("Plain text:");
    
    printf("cypher text: ");
    cypher = cypher_text(plain, argv[1]);
    printf("\n");
    
    //printf("Cypher text:%s\n",cypher);

}


string cypher_text(string text, string key)
{
    int value;
    char cypher[strlen(text)];
    for (int i=0, n=strlen(text);i<n;i++)
    {
        if (isalpha(text[i]))
        {
            if (isupper(text[i]))
            {
                value = (int)text[i]-65;
                cypher[i] = key[value];
                printf("%c",key[value]);
            }
            else
            {
                value = (int)text[i]-97;
                //cypher[i] = key[value];
                printf("%c",(int)key[value]+32);
            }
        }
        else
        {
            //cypher[i] = text[i];
            printf("%c",text[i]);
        }
    }
    
    return cypher;
}
Mitch
  • 553
  • 1
  • 9
  • 24
  • 1
    Does this answer your question? [Returning an array using C](https://stackoverflow.com/questions/11656532/returning-an-array-using-c) – Eugene Sh. Oct 12 '21 at 18:39
  • https://en.wikipedia.org/wiki/Dangling_pointer – Hans Passant Oct 12 '21 at 18:43
  • @EugeneSh. this doesn't explain why I can't just pass a string, which is supposed to be an array of chars. Like in certain situations you can pass a string and this problem doesn't arise. Why is this an issue in this situation? I would prefer to just treat it as a string, just like argv[1] can be treated as a string. – Mitch Oct 12 '21 at 19:34
  • @Mitch *I would prefer to just treat it as a string, just like argv[1] can be treated as a string* And I'd prefer to treat my car as a helicopter every time I get caught in traffic. But it's not a helicopter. And your `cypher` ***array*** is just that - an array. And it ceases to exist as soon as your `cypher_text()` function returns. And you are finding out ***why*** CS50's `string` is a confusing abomination based on obfuscation - it's nothing more than a `typedef` for `char *`, and because of that, people taking CS50 never learn fundamental concepts of C such as pointers and arrays. – Andrew Henle Oct 13 '21 at 23:19

2 Answers2

2

I want to understand why I can't return the string from the cypher_text() function back to the main().

Short answer - you cannot return an array from a function in C. What actually gets returned is a pointer to the first element of the array; unfortunately, once the function exits, that array ceases to exist and that pointer is no longer valid.

C's treatment of arrays is different from most languages'. When the compiler sees an expression of type "N-element array of T", that expression will be converted (or "decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element unless:

  • the array expression is the operand of the sizeof, _Alignof, or unary & operators; or
  • the expression is a string literal being used to initialize a character array in a declaration.

There is a reason for this - it's not an arbitrary thing just to drive people crazy. It allows C to mimic the same array subscripting behavior as its precursor language B. Unfortunately, it means arrays lose their "array-ness" under most circumstances and what you're actually working with is a pointer. When you write

return cypher;

in your function, what the compiler sees is

return &cypher[0];

The string type in the CS50 library is a lie; it's just a typedef (alias) for the type char * (pointer to char). C doesn't have an actual string datatype as such. In C, a string is just a sequence of character values including a 0-valued terminator. Strings (including string literals like "hello") are stored in arrays of character type (char for "normal" character encodings like ASCII, EBCDIC, and UTF-8, and wchar_t for "wide" encodings like UTF-16). But character arrays can also store sequences that aren't strings.

So, when you need to create a new string and return it to the caller, you have three options:

Have the caller pass the target array (along with its size!!!) as a parameter to the function; again, what the function receives is a pointer to the first element:

void cypher_text(string text, string key, string cypher, size_t cypher_size) // or char * cypher, since that's what's really happening
{
  ...
  cypher[i] = ...;
  ...
}

int main( void )
{
  ...
  char cypher[TEXT_LENGTH+1]; // +1 for 0 terminator
  ...
  cypher_text( plain, argv[i], cypher, sizeof cypher );
  ...
}

This is my preferred option, as the caller is responsible for managing the memory of the cypher array. Life's just simpler if one entity keeps track of the lifetime.

Alternately, you could have cypher_text dynamically allocate the buffer to store the cypher text - dynamically allocated memory stays allocated until you explicitly free it:

string cypher_text( string plain, string key )
{
  string cypher = calloc( strlen( text + 1 ), sizeof *cypher );
  if ( cypher )
  {
    // compute cypher and assign text here
  }
  return cypher;
}

This is what the CS50 get_string() function is doing for you behind the scenes - it's using calloc (or malloc, not sure at the moment) to allocate the memory in which the string is stored. Just make sure to free the array when you're done with it. Note that you could combine this approach with the first method - have main use calloc to allocate the memory for cypher and pass that pointer to cypher_text() - that way you don't have to declare cypher as a fixed-size array as I did in the first example.

Or, you could use a global array. I do not recommend this and will not provide an example because it's a bad idea, but I'm just including it for completeness. Sometimes using a global isn't the wrong answer, but this isn't one of those times.

Welcome to C. It's ... not intuitive. The CS50 library tries to hide the nastier parts of the language from you, but in the process grossly misrepresents how it actually works, and I think that doesn't do you any favors.

Edit

From the comments:

So in CS50 when I create a string name (which is not actually a string but a typedef) in main(), that string is actually a typedef array of chars with dynamically allocated size.

The declaration

string name;

is identical to

char *name;

You're declaring name as a pointer, not an array. All name stores is the address of a char object somewhere else in memory. That object may be the first element of an array containing a string; it may be the first element of an array containing a sequence that isn't a string (doesn't have a zero-valued terminator); it may be a single char object that isn't part of a larger array.

The get_string function dynamically allocates the array and returns the pointer to it, which gets saved to name.

Assume the following declarations:

char text[] = "foo"; // array size is taken from the size of the literal;
char *ptr = text;

ptr stores the address of the first element of the text array. When you look at the objects in memory, you have something like this:

       Item         Address   00   01   02   03
       ----         -------   --   --   --   --
       text  0x7ffeed855a08   66   6f   6f   00    foo.

        ptr  0x7ffeed855a00   08   5a   85   ed    .Z..
             0x7ffeed855a04   fe   7f   00   00    ....

       *ptr  0x7ffeed855a08   66   6f   6f   00    foo.

In picture form:

      +---+
 ptr: |   | -----------+
      +---+            |
       ...             |
      +---+            |
text: |'f'| text[0] <--+
      +---+
      |'o'| text[1]
      +---+
      |'o'| text[2]
      +---+
      | 0 | text[3]
      +---+

The first element of the text array (text[0]) starts at address 0x7ffeed855a08. The variable ptr lives starting at address 0x7ffeed855a00. The value stored in ptr is the address of text[0] - 0x7ffeed855a08 (x86 is little-endian, so you need to read multi-byte objects from right to left). All of the following relationships are now true:

 ptr == &text[0]         // char * == char *
*ptr ==  text[0] == 'f'  // char   == char   == char

Now, ptr could also store the address of a dynamically allocated buffer:

ptr = calloc( 4, sizeof *ptr );

which gives us:

       Item         Address   00   01   02   03
       ----         -------   --   --   --   --
       text  0x7ffeed855a08   66   6f   6f   00    foo.

        ptr  0x7ffeed855a00   30   2a   c0   df    0*..
             0x7ffeed855a04   ce   7f   00   00    ....

       *ptr  0x7fcedfc02a30   00   00   00   00    ....

Again, graphically:

      +---+
      | 0 | <--------+ 
      +---+          |    calloc zeroes out the contents of the
      | 0 |          |    newly-allocated buffer; malloc and
      +---+          |    realloc do not.  
      | 0 |          |
      +---+          |
      | 0 |          |
      +---+          |  
       ...           |
      +---+          |
 ptr: |   | ---------+
      +---+            
       ...             
      +---+            
text: |'f'| text[0] 
      +---+
      |'o'| text[1]
      +---+
      |'o'| text[2]
      +---+
      | 0 | text[3]
      +---+

Essentially this is what get_string() is doing for you behind the scenes - the thing CS50 calls a string is really just a pointer to the first element of a dynamically allocated buffer.

In C, the character sequence {'f', 'o', 'o', 0} is a string. The string type defined by CS50 is not.

Now when I pass that string name from main() to function(name), how does the size of string name also get passed to the function(name)?

It doesn't. When you pass an array expression as an argument to a function, all the function receives is a pointer to the first element of that array. A pointer points to a single object; there's no metadata to tell you whether that object is part of a larger sequence or how long that sequence would be. You either have to pass the array size as a separate parameter, or you have to use some kind of sentinel value to mark the end of the array's contents.

The C standard library functions rely on the presence of that zero terminator to mark the end of the string; they don't know the physical size of the array storing the string.

Arrays in C don't grow or shrink automatically - if you try to write "supercalifragilisticexpealidocious" to text (either by using a library function like strcpy or copying each element individually), you'll write past the end of the array and overwrite something else. C doesn't require any kind of bounds checking on array accesses, and reading or writing past the end of an array can have unpredictable results - your code may crash outright, or you may corrupt data, or it may branch off into a random subroutine, or it may work without any apparent issues. This is why it's important to pass the array size as a separate parameter to any function that needs to update the array's contents.

A weakness of the C standard library (and the C programming philosophy in general) is the assumption that the programmer is in the best position to know whether a bounds or range check is required and is smart enough to write it if it is, because strcpy and strcat and a host of other standard library functions won't do it for you (yes, you can use strncpy and strncat instead, but their behavior is subtly different - they won't properly terminate the target string under some circumstances). It's not an accident so many C-based systems are vulnerable to buffer overflow attacks.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • So in CS50 when I create a ```string name``` (which is not actually a string but a typedef) in ```main()```, that string is actually a typedef array of chars with dynamically allocated size. Now when I pass that ```string name``` from ```main()``` to ```function(name)```, how does the size of ```string name``` also get passed to the ```function(name)```? Or does it not need to because ```calloc``` has already been used on it behind the scene? – Mitch Oct 13 '21 at 13:06
  • @Mitch: see my edit. – John Bode Oct 13 '21 at 14:31
1

You have two problems with the cypher_text function.

The first is that

char cypher[strlen(text)];

does not make room for the string null-terminator. You need to allocate space for the terminator, as well as adding it to yhr buffer.

The second problem is that it's a local variable, whose life-time ends when the function ends. So the pointer you return from the function will immediately be invalid.

Two solutions to the second problem:

  1. Pass a pointer to an allocated buffer of the needed length (including the null-terminator), and use it instead;

  2. Or allocate memory dynamically, which you return a pointer to. This requires you to free the memory once you're done with it otherwise you will have a memory leak.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • Ah right the null-terminator makes sense. Also why can't I just create a "cypher" as a string instead of a char and pass chars to cypher[i]? – Mitch Oct 12 '21 at 19:23
  • @Mitch I'm not exactly sure what you mean, but you have to remember that CS50 hides the actual types of it strings behind the `string` type-alias. It's really `char *`. A pointer. And all pointers must point somewhere valid before being used. – Some programmer dude Oct 12 '21 at 19:36
  • so in other problems we could pass and return strings to a function. I was then told that strings are essentially arrays of characters, like argv[1] is a string, and I can do strlen() on it. In that case why can I not just pass two strings to my function cypher_text() and return a string. I'm just taking an input string, and changing it. Why do I have to make it a char array at any point? Why can it not just stay a string and avoid the pointer array issues altogether? – Mitch Oct 12 '21 at 19:41
  • 1
    You may find [CS50x - PSET2- Substitution](https://stackoverflow.com/a/65910667/3422102) helpful. It seems to be the same pset. – David C. Rankin Oct 12 '21 at 19:53