-1

I am attempting to capture input from the user via scanf:

char numStrings[5000];   
printf("Enter string of numbers:\n\n"); 
scanf("%s", numStrings);

However, the length of the string that is inputted is 5000 characters. The translation limit in c99 is 4095 characters. Do I need to instruct the user to break their input in half or is there a better work around that I cannot think of?

iam12thman
  • 21
  • 2
  • 3
    Where did you read that there is a maximum string size? there is a maximum *stack* size and it's os dependent. – Iharob Al Asimi May 24 '15 at 02:44
  • @iharob is correct i believe but a side note is if your array is of size 5000 then you can actually only read in a string of size 4999 because you need a null delimiter at the end ('\0') so i would change that to `scanf("%4999s", numStrings);` to avoid buffer overflow – JackV May 24 '15 at 02:47
  • I saw the string size limit here http://bytes.com/topic/c/answers/786961-size-limits-string-literals as well as a few other places – iam12thman May 24 '15 at 02:52
  • 1
    @iam12thman that's "translation limits", i.e the literal size that the compiler must be able to handle, not object size limit. "The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits - 4095 characters in a character string literal or wide string literal (after concatenation)" – phuclv May 24 '15 at 03:03
  • Do you suppose it might be a good idea to eliminate this statement from your question, now that you've discovered that it's invalid? "The max capacity of a string in c99 is 4095 characters." – autistic May 24 '15 at 04:23
  • @undefined behaviour - Will you verify that the question makes sense now? It's unfortunate that I got down voted after spending around two hours looking through Stack Overflow and Google for an answer and now that I know the question to ask, I'm screwed. – iam12thman May 27 '15 at 18:12
  • @Lưu Vĩnh Phúc - Thank you, even though you didn't provide a work around, this totally enlightened me. – iam12thman May 27 '15 at 18:21
  • @iam12thman With the statement "The max capacity of a string in c99 is 4095 characters", your question makes no sense because it's not true. Without that statement (e.g. if you remove it), your question makes sense, though it may not mean what you originally meant it to mean (which was only a problem because you believed that statement applied to *strings* rather than *string literals*). – autistic May 27 '15 at 23:05

3 Answers3

4

You can input a string a lot larger than that, the stack is at least 1MB in common OS's, it's 8MB's on Linux, so that's the actual limit, 1M is 1024KB so you could for example try with 512KB which is 524288B

char string[524288];
scanf("%524287s", string);

will be most likely ok, if it's still too small, then use malloc().

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • So the numerical entry between % and s (%524287s) is essentially forewarning the program that an input that size is coming? When I enter the 5000 character string the program gets killed. Note: I am using a virtual machine. – iam12thman May 24 '15 at 02:56
  • @iam12thman the number is the "maximum field width, that is, the maximum number of characters that the function is allowed to consume when doing the conversion specified by the current conversion specification" http://en.cppreference.com/w/cpp/io/c/fscanf – phuclv May 24 '15 at 03:07
  • 1
    It's (generally) a bad idea to use too much stack memory, and using `malloc` instead makes yet even more memory available: up to the gigabytes range for a reasonable amount of system memory, and even more if the OS allows the use of external memory for a single object. Enjoy typing such an amount into your prompt! – Jongware May 24 '15 at 21:04
  • I'm surprised nobody has mentioned that these such objects needn't be stored entirely *on the stack*, and that *the stack* and *the heap* are likely *the same hardware components* anyway. The technical names are *automatic storage duration*, *static storage duration*, *allocated storage duration* and *thread specific storage duration*. We should use those terms more often, and terms like *the stack* and *the heap* less often. It's generally a bad idea to use too much of *any* memory, though if the user twists our arm we'll be forced to, so I think this answer is acceptable. – autistic Feb 23 '16 at 04:07
  • Finally, I like to choose storage durations the same way I choose variable types; if I desire textual operations then an array of `char` is probably suitable, where-as integer or floating point operations require a different choice obviously. Similarly, if I require that a string potentially grow infinitely, I'll *always* be choosing `realloc`... and for most other situations with a bit of tasteful refactoring automatic storage duration and allocated storage duration are virtually interchangeable. – autistic Feb 23 '16 at 04:11
3

No, you do not need to instruct the user to separate the input if it goes over a set length. The limit is on string literals, not strings. See the answer in this stackoverflow thread for more information. If you don't know what a reasonable max length is, then I would recommend using getline() or getdelim() if the delimiter that you want to use is not a line break.

Community
  • 1
  • 1
RandomName
  • 56
  • 4
  • Thank you and sorry for the mistake in my question. – iam12thman May 24 '15 at 03:02
  • This isn't an answer to the question; it should definitely be a comment, though. – autistic May 24 '15 at 03:48
  • FWIW, the question is "Do I need to instruct the user to break their input in half or is there a better work around that I cannot think of?"... and this response isn't an answer to it in any form. Cross out the assertion regarding 4095 being a limit and the question still makes sense, yeh? ... but this answer doesn't. – autistic May 24 '15 at 04:19
  • @undefinedbehaviour Thank you for the advise. Also, I originally posted my response as an answer rather than a comment because I didn't have enough reputation. – RandomName May 24 '15 at 04:40
0

Do I need to instruct the user to break their input in half or is there a better work around that I cannot think of?

As far as the code you've given goes, if the input word is longer than 4999 bytes then you can expect a buffer overflow. Yes, it would be wise to let someone (e.g. the user, or the guy who maintains this code next) know that's the maximum length. It's nice that you can truncate the input by using code like this: scanf("%4999s" "%*[^ \n]", numStrings);... The %*[^ \n] directive performs the truncation, in this case.

It'd be nicer yet if you can let the user know at the time that they overflow the buffer, but scanf doesn't make that an easy task. What would be even nicer (for the user, I mean) is if you could use dynamic allocation.

Ahh, the problem of dynamically sized input. If it can be avoided, then avoid it. One common method to avoid this problem is to require input in the form of argv, rather than stdin... but that's not always possible, useful or feasible.

scanf doesn't make this problem a particularly easy one to solve; in fact, it'd be much easier to solve if there were a similar functionality provided by %s in the form of an interface similar to fgets.

Without further adieu, here's an adaptation of the code I wrote in this answer, adapted for the purpose of reading (and simultaneously allocating) words in a similar procedure to that behind %s, rather than lines in a similar procedure to that behind fgets. Feel free to read that answer if you would like to know more about the inspiration behind it.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

char *get_dynamic_word(FILE *f) {
    size_t bytes_read = 0;
    char *bytes = NULL;
    int c;
    do {
        c = fgetc(f);
    } while (c >= 0 && isspace(c));
    do {
        if ((bytes_read & (bytes_read + 1)) == 0) {
            void *temp = realloc(bytes, bytes_read * 2 + 1);
            if (temp == NULL) {
                free(bytes);
                return NULL;
            }
            bytes = temp;
        }

        bytes[bytes_read] = c >= 0 && !isspace(c)
                            ? c
                            : '\0';
        c = fgetc(f);
    } while (bytes[bytes_read++]);
    if (c >= 0) {
        ungetc(c, f);
    }
    return bytes;
}
Community
  • 1
  • 1
autistic
  • 1
  • 3
  • 35
  • 80