4

This is a pretty big question, so please take time to read and please provide an answer.

My question is, How can we take input as a String in C?

We normally ask the user to provide the number of characters, let's say n and we can simply declare like char str[n]. This will all be well and good.

But, when we normally declare a size like char str[100], etc. But if we provide a string let's say, of a length 20, then 80 bytes are getting wasted, we don't normally want that, is it ok to declare like that.

What if the user gives let's say a string of input 120, then only 100 characters will be stored in our character array, we don't want that either.

So, basically we don't know what a user may input. He inputs a string of length, his choice.

In the above cases, we take input using scanf or gets, like scanf("%s", str), scanf("%[^\n]%*c", str), scanf("%[^\n]s",str), gets(str), etc.

When we use scanf, when we input a string let's say of length 5 and when we give 6 characters, the 6th character won't get stored.

When we use puts, when we input a string let's say of length 5 and when we give 6 characters, the 6th character will get stored in the succesive byte, but the 6th character won't get displayed when we try to print. When we input 6 characters, it gives a message like 'stack smashing detected'. We don't know what other data is there, it may get overridden.

Is the above mentioned cases right or wrong, could you please help me?

Now, there is another way to declare String and take input as a string, we may use pointers, like we may dynamically allocate memory and then deallocate when our work with the string is finished. We use like, malloc, calloc, realloc to allocate memory and free to deallocate.

We may declare like char* str = (char*)malloc(size*sizeof(char)) and we take input as scanf("%[^\n]s",str). But here also, we need to provide the size. What if we don't know the size? What if the user provides input greater than the size?

We may also declare like char* str = (char*)malloc(sizeof(char)). Here, when we input a string lets say of length 5. The string gets stored in the heap in consecutive bytes, but we have only allocated 1 byte, the remaining 4 bytes of our input is stored in a way that, it is basically illegal memory access, we can't do that, can we?

The above mentioned 2 cases are the same, is this right or wrong? Could you please help me?

I'm in a Zugzwang, chess terms. Could you please help me? What are the ways there to declare a string and take input without specifying the size? Can we dynamically allocate without specifying the size? What are all the ways to declare a string?

Oka
  • 23,367
  • 6
  • 42
  • 53
Goutham18
  • 49
  • 1
  • 4
  • 1
    How about you create a buffer for a string input that has maximum size for your use case, say `char buf[4096]`. Then, accept input to `buf`, then `strlen(buf)` and use that value to dynamically allocate the actual string. – PHD May 29 '21 at 02:45
  • Already answered Here https://stackoverflow.com/questions/20918341/arbitrary-length-string-in-c/20918363 – user3766054 May 29 '21 at 02:51
  • Typical implementations actually read up to the size of the array and discards the rest of the input as garbage. If you want something to be able to store any size of input you can implement a function that allocates memory dynamically, and keeps reading the input and realloc-ing it to expand the memory space for the string as much as necessary, but eventually you will have to set a limit, and discard what couldn't be read, because memory space isn't infinite. – isrnick May 29 '21 at 02:59
  • Hey PHD, I'll do that, but wouldn't the buffer take space itself on the stack, after the function ends, then the space gets cleared, is this right? is there anyway to explicitly delete the memory, buf[4096]. – Goutham18 May 29 '21 at 03:14
  • Hey user3766054, i'll check that – Goutham18 May 29 '21 at 03:15
  • Hey isrnick, it seems good to me. the condition may be that when the user hits enter (new line feed, ascii is 10), then stop. is this right? – Goutham18 May 29 '21 at 03:17
  • Side note: As always, [*never* use `gets`](https://stackoverflow.com/questions/1694036/why-is-the-gets-function-so-dangerous-that-it-should-not-be-used) and [do not cast the return of `malloc` in C](https://stackoverflow.com/a/605858). – Oka May 29 '21 at 03:18
  • Hey Oka, yes gets() is dangerous to work. We cast return of malloc to our required datatype, normally malloc returns void pointer. is this right? – Goutham18 May 29 '21 at 03:20
  • No, you never cast the return of malloc, void* is automatically converted to whatever kind of pointer you need to assign it to in C. – isrnick May 29 '21 at 03:23
  • @Goutham18 Please read the links for more information. In C `void *` can be safely and automatically promoted to any other pointer type. The cast is unnecessary. – Oka May 29 '21 at 03:24
  • Hey isrnick, yes it is true, i meant to say that – Goutham18 May 29 '21 at 03:25
  • @isrnick yes its true, i meant to say that – Goutham18 May 29 '21 at 03:26
  • @Goutham18 It is not possible to delete memory on the stack. The scope depends on how you define it (automatic or static). – PHD May 29 '21 at 03:40
  • If for some reason memory in the stack is a concern (why would it be a problem anyway? that frame will be lost once you exit the read function) you can manually allocate the buffer on heap. – Miguel Sandoval May 29 '21 at 04:22
  • [Related thoughts from a similar answer](https://stackoverflow.com/a/67747597/3422102). The bottom line on `char str[100]` and wasted 80 chars or so is you have a 4M stack on Linux and 1M on windows (in most cases) so the 80 wasted characters is `0.0076%` (`7.6e-5` fraction) of your stack space on windows or `0.0019%` of your stack space on Linux. I'd rather have 10,000 characters too many that one to few any day.... – David C. Rankin May 29 '21 at 05:12

3 Answers3

1

From the manual, getline(3) is what you're looking for.

   #include <stdio.h>

   ssize_t getline(char **restrict lineptr, size_t *restrict n,
                   FILE *restrict stream);

A little bit of text from it:

getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found.

If *lineptr is set to NULL and *n is set 0 before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed.

Alternatively, before calling getline(), *lineptr can contain a pointer to a malloc(3)-allocated buffer *n bytes in size. If the buffer is not large enough to hold the line, getline() resizes it with realloc(3), updating *lineptr and *n as necessary.

In either case, on a successful call, *lineptr and *n will be updated to reflect the buffer address and allocated size respectively.

So, getline will malloc or even realloc the buffer your provide it. With that in mind, you could write a program like this:

/* getline.c
 *
 */
#include <stdio.h>

int main(void)
{
    char *s = NULL;
    ssize_t n = 0;

    fprintf(stderr, "Line: ");
    getline(&s, &n, stdin);

    printf("Size: %zu\n", n);
    //printf("String: %s", s);
    
    /* @isrnick comment */
    free(s);

    return 0;
}

And then test it with something like this:

$ make getline
$ python -c "print('A' * 2000000)" | ./getline
Size: 2097664
$

And it will print the size of alloced buffer. Since we type ENTER to enter some string, and ENTER gives us \n, getline should be fine.


Rudimentary, generic `cat` program:
/* gcat.c
 */

#include <stdio.h>

int main(int argc, char **argv)
{
    char *s;
    ssize_t n;
    FILE *fp = stdin;

    if (argc > 1) {
        if(!(fp = fopen(argv[1], "r"))) {
            perror("fopen");
            return -1;
        }
    }

    while(getline(&s, &n, fp) > 0) 
        printf("%s", s);


    /* @isrnick comment */
    free(s);

    return 0;
}

And you can call it with either of these:

$ cat gcat.c | ./gcat

Or...

$ ./gcat gcat·c
Enzo Ferber
  • 3,029
  • 1
  • 14
  • 24
  • 1
    Note: `getline` is not a standard C function and may not be available by default. – isrnick May 29 '21 at 03:36
  • 1
    Dynamically allocated memory should be freed. – isrnick May 29 '21 at 03:36
  • 1
    @isrnick Agreed, but If you're right at `return` from `main`, like the example above, the operating system (Linux, at least) will take care of that for you. – Enzo Ferber May 29 '21 at 04:37
  • 1
    @isrnick Edited the post to `free` the buffer after all was done. – Enzo Ferber May 29 '21 at 04:46
  • Yes, the OS will typically free the memory when the process ends, but still, it is better to force yourself to make the program free it directly, and to not never rely on the OS to free it, even if just to create the habit of always freeing dynamically allocated memory so as to not forget to do it when it is actually necessary. – isrnick May 29 '21 at 04:59
1

Theory

One solution could be to create linked structs of buffers.

This way, each time a buffer runs out of space you can simply allocate more memory for another buffer, and link them together. This linked list of buffers can keep growing until the input is done.

Once the input has finished, you allocate one big chunk of consecutive memory for the string and then walk through the list of linked buffers and copy the data to the final string.

Finally, the allocated memory for the linked buffers is freed.

Practical example

Reading arbitrary-length strings can be as simple as this:

    int main(int argc, char *argv[])
    {
        char *string = readLine(); //read arbitrary-length string
        printf("%s", string); //print string
        free(string); //dont forget to free the string!
        return 0;
    }

So lets make the readLine() function ourselves.

  1. Create a linked buffer struct:
    #define LINKEDBUFFER_SIZE 256
    
    struct SLinkedBuffer
    {
        char buffer[LINKEDBUFFER_SIZE];
        int idx;
        struct SLinkedBuffer *next;
    };

    typedef struct SLinkedBuffer LinkedBuffer;
    
    LinkedBuffer *newLinkedBuffer()
    {
        LinkedBuffer *result = (LinkedBuffer *) malloc(sizeof(LinkedBuffer));
        if (result == NULL)
        {
            printf("Error while allocating memory!\n");
            exit(1);
        }
        result->idx = 0;
        result->next = NULL;
        return result;
    }
  1. Create a read function making use of our just-defined linked buffers:
    char *readLine()
    {
        char *result = NULL;
        size_t stringSize = 0;
        
        /* Read into linked buffers */
        LinkedBuffer *baseLinkedBuffer = newLinkedBuffer();
        LinkedBuffer *currentLinkedBuffer = baseLinkedBuffer;
        int currentChar;
        while ((currentChar = fgetc(stdin)) != EOF && currentChar != '\n')
        {
            if (currentLinkedBuffer->idx >= LINKEDBUFFER_SIZE)
            {
                currentLinkedBuffer->next = newLinkedBuffer();
                currentLinkedBuffer = currentLinkedBuffer->next;
            }
            currentLinkedBuffer->buffer[currentLinkedBuffer->idx++] = currentChar;
            stringSize++;
        }
        
        /* Copy to a consecutive string */
        int stringIndex = 0;
        result = malloc(sizeof(char) * (stringSize + 1));
        if (result == NULL)
        {
            printf("Error while allocating memory!\n");
            exit(1);
        }
        currentLinkedBuffer = baseLinkedBuffer;
        while (currentLinkedBuffer != NULL)
        {
            for (int i = 0; i < currentLinkedBuffer->idx; i++)
                result[stringIndex++] = currentLinkedBuffer->buffer[i];
            currentLinkedBuffer = currentLinkedBuffer->next;
        }
        result[stringIndex++] = '\0';
        
        /* Free linked buffers memory */
        while (baseLinkedBuffer != NULL)
        {
            currentLinkedBuffer = baseLinkedBuffer->next;
            free(baseLinkedBuffer);
            baseLinkedBuffer = currentLinkedBuffer;
        }
        
        return result;
    }

And now we can simply use the readLine() function to read any string as shown in the main function!

0

This code will help u to take a string without any length

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
    char *line = NULL;
    size_t len = 0;
    ssize_t read;
    read = getline(&line, &len, stdin);
    printf("%s",line);
    printf("%lu",strlen(line));
    free(line);
    return 0;
}