4

We have test application in c which takes input using scanf in string format and that string it uses for further processing.

So far everything was working fine, however lately we have condition where need to input more than 4100 bytes and scanf needs to read them however scanf doesn't read more than 4095 from stdin. Simplest form of problematic code is as follows,

#include <stdio.h>
#include <string.h>

int main()
{
    char input_array[5000];
    int len;
    printf("Enter key: ");
    scanf("%s",input_array);
    len = strlen(input_array);

    printf("Message: %s\n",input_array);
    printf("Message Len: %d\n",len);
    return 0;
}

I think this is happening because scanf can max read two lines and one line size is 2k.(Correct me if I am wrong).

Now this code works if we read characters from file but that way we need to change other test code also :(

Currently we are copying and pasting 4200 characters on terminal to give input to scanf.

Now my questions are,
Is there a way to instruct scanf to read more than 2 line?
Is there any other function which we can use which doesn't have this limitation ?

ART
  • 1,509
  • 3
  • 29
  • 47
  • `scanf` stops at the first white-space, `'\n'`, `'\t'` and `' '` are considered white-spaces, you can use `fgets` and strip the trailing newline. – David Ranieri Sep 17 '16 at 11:54
  • 1
    C11 draft standard n1570: *7.21.6.2 The fscanf function 12 The conversion specifiers and their meanings are: [...] s Matches a sequence of non-white-space characters.[...]* – EOF Sep 17 '16 at 11:54
  • @AlterMann we removed the all new lines from the string. I am using dd to generate big string. e.g. `dd if=<(yes hi|tr -d '\n') bs=4100 count=1` – ART Sep 17 '16 at 11:57
  • I tried `gets` and `fgets` and they also have same problem :( – ART Sep 17 '16 at 12:06
  • @user3121023 but how to overcome that limit ? – ART Sep 17 '16 at 12:15
  • @AnkurTank Uhm...hvae you checked the size of the output of `dd`? If I do what you wrote you did I get a length of 4096. If I change the count to 2 I get twice the amount (reason for that do not fit in comment). With the additional incrementening of the buffer as tofro described it works (but segfaults at the end). – deamentiaemundi Sep 17 '16 at 12:19
  • @user3121023 but that changes only the buffer size of the stdin in the standard library. When you read data from your terminal, it'll be copied from the terminal's 4096 byte buffer. So It won't help. – Raman Sep 17 '16 at 12:27
  • 1
    Try `stty cbreak` before running the program. If it works, Its because of canonical input mode. – Raman Sep 17 '16 at 12:31
  • @ARBY `stty cbreak` works, what does it do ? can it be done from c code? – ART Sep 17 '16 at 12:35
  • 1
    Yes, It can be. I'm reading a bit about it. Then I'll write an answer as to how it can be done from c code. – Raman Sep 17 '16 at 12:41
  • @AlterMann Detail about `scanf("%s"...`, it does not stop at the first white-space - when they are leading white-spaces. Those are quietly read and not saved. After some non-white-space is saved, scanning stops when encountering white-space.. – chux - Reinstate Monica Sep 17 '16 at 12:43
  • @chux, you are right – David Ranieri Sep 17 '16 at 12:45
  • "Is there a way to instruct scanf to read more than 2 line?" --> Yes, but that is not the crux of the problem. IAC, recommend using `fgets()` rather than `scanf()`. – chux - Reinstate Monica Sep 17 '16 at 12:45
  • @AnkurTank If you want to manipulate the terminal programmatically: http://tldp.org/HOWTO/Serial-Programming-HOWTO/x115.html Protip: don't, at least not for your problem. – deamentiaemundi Sep 17 '16 at 12:48
  • @deamentiaemundi for me output of dd prints 4200 characters. I echoed it and counted it using `wc -c`. – ART Sep 17 '16 at 12:56
  • @AnkurTank Oh, that's interesting! (No, not for your problem, sorry, just for me) – deamentiaemundi Sep 17 '16 at 13:02
  • Are you providing input by copying and pasting into a termibal? That's the only way I can make sense of the solution, but if that is the case you should edit your question to make it clear. Otherwise it will be highly misleading to future readers. – rici Sep 17 '16 at 17:57
  • And what is wrong with pipes??? `yes hi | tr -d '\n' | dd bs=4100 count=1 | wc -c` for example. ( Or replace `wc-c` with the executable you want to test.) – rici Sep 17 '16 at 18:00
  • @rici yes we are providing input by copying and pasting into terminal because its not generated by `dd`. Sure let me update the question for the benefit of future readers. – ART Sep 19 '16 at 06:10

2 Answers2

5

It is because your terminal inputs are buffered in the I/O queue of the kernel.

Input and output queues of a terminal device implement a form of buffering within the kernel independent of the buffering implemented by I/O streams.

The terminal input queue is also sometimes referred to as its typeahead buffer. It holds the characters that have been received from the terminal but not yet read by any process.

The size of the input queue is described by the MAX_INPUT and _POSIX_MAX_INPUT parameters;

By default, your terminal is in Canonical mode.

In canonical mode, all input stays in the queue until a newline character is received, so the terminal input queue can fill up when you type a very long line.

Now to answer your questions:

Is there a way to instruct scanf to read more than 2 line?

That 2 line concept is wrong. Anyways, you can't instruct scanf to read more than 4096 bytes if the maximum size of I/O queue of the terminal is set to 4096 bytes.

Is there any other function which we can use which doesn't have this limitation ?

No you can't even with any other function. It's not a limitation of scanf.


EDIT: Found a rather standard way of doing it

We can change the input mode of terminal from canonical mode to non-canonical mode.

To change the input mode we have to use low level terminal interface.

We can do the task as follows:

#include <stdio.h>
#include <string.h>
#include <termios.h> 
#include <unistd.h>

int clear_icanon(void)
{
  struct termios settings;
  int result;
  result = tcgetattr (STDIN_FILENO, &settings);
  if (result < 0)
    {
      perror ("error in tcgetattr");
      return 0;
    }

  settings.c_lflag &= ~ICANON;

  result = tcsetattr (STDIN_FILENO, TCSANOW, &settings);
  if (result < 0)
    {
      perror ("error in tcsetattr");
      return 0;
   }
  return 1;
}

int main()
{
    clear_icanon(); // Changes the input mode of terminal from canonical mode to non canonical mode.

    char input_array[5000];
    int len;
    printf("Enter key: ");
    scanf("%s",input_array);
    len = strlen(input_array);

    printf("Message: %s\n",input_array);
    printf("Message Len: %d\n",len);
    return 0;
}

In case you want to know how to do it from terminal

$ stty -icanon (changes the input mode to non-canonical)
$ stty icanon (changes it back to canonical)

Earlier answer was: (This technique is older)

I don't know whether it is the best way or not, but It can be done by changing the terminal mode from cooked (default) to cbreak or to raw mode.

When the terminal is in cbreak mode, It works with single characters at a time, rather than forcing a wait for a whole line and then feeding the line in all at once.

either you can use stty cbreak in terminal before executing the program.

or

To use cbreak mode programatically

First install the curses package by running

$ sudo apt-get install libncurses5-dev

Next edit the program as follows:

#include <stdio.h>
#include <string.h>
#include <curses.h> 

int main()
{
    initscr(); //start curses mode      
    cbreak(); //change the terminal mode to cbreak. Can also use raw();
    endwin(); //end curses mode

    char input_array[5000];
    int len;
    printf("Enter key:");
    scanf("%s",input_array);
    len = strlen(input_array);

    printf("Message:%s\n",input_array);
    printf("Message Len:%d\n",len);
    return 0;
}

Now compile with the -lcurses option

$ gcc 1.c -lcurses
Community
  • 1
  • 1
Raman
  • 2,735
  • 1
  • 26
  • 46
2

As ARBY correctly stated: the actual problem is the discrepancy in the buffersizes of the LibC and the terminal. If you accept that limitation you are OK.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  char input_array[5000];
  size_t len;
  int res;

  printf("BUFSIZ = %d\n", BUFSIZ);

  while ((res = scanf("%4095s", input_array)) == 1) {
    len = strlen(input_array);
    printf("Message Len:%zu\n", len);
  }
  return 0;
}

Output:

$ dd if=<(yes hi|tr -d '\n') bs=4200 count=2 of=longline
$ gcc-4.9 -O3 -g3  -W -Wall -Wextra  -std=c11 checkbuf.c -o checkbuf
$ ./checkbuf < longline
BUFSIZ = 8192
Message Len:4095
Message Len:4095
Message Len:106

EDIT One, not recommended way to concatenate the results involves a wee bit of pointer-juggling:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  char input_array[10000];
  char *ptoarr;
  size_t len;
  int res;

  printf("BUFSIZ = %d\n", BUFSIZ);

  ptoarr = input_array;

  while ((res = scanf("%4095s", ptoarr)) == 1) {
    len = strlen(ptoarr);
    // TODO check that total length is smaller than or equal to input_array size
    printf("Message Len:%zu\n", len);
    ptoarr += len;
  }
  len = strlen(input_array);
  printf("Message Len:%zu\n", len);
  return 0;
}

But, as I said, not recommended.

deamentiaemundi
  • 5,502
  • 2
  • 12
  • 20
  • how can we break the while loop ? for me its infinite while loop :( – ART Sep 17 '16 at 12:49
  • @AnkurTank Did you use exactly what I gave you? The code, the construction of the input file with `dd`, the compiler stage (ok, that should not make a large difference with one of the current versions of GCC) and the call? What is your output? – deamentiaemundi Sep 17 '16 at 12:54
  • ohh I didn't use compiler command what you gave. Let me try with that. – ART Sep 17 '16 at 12:57
  • If I give input as you suggested it works, however input_array looses previous input. and contains 106 characters at the end. we need to take input from user and input_array must have whole string after scanf. – ART Sep 17 '16 at 13:02
  • @AnkurTank yes, you have to gatehr it into a buffer yourself. You can use some pointer-juggliing and reuse your single buffer or make two: one of size `4096` as a buffer for `scanf` and one with sufficient size for the whole input and concatenate what `scanf` delivers into that second buffer. – deamentiaemundi Sep 17 '16 at 13:05
  • Hmm that can be done. How would u bread a loop, when taking input from user? – ART Sep 17 '16 at 13:07
  • @AnkurTank If you don't know the size of the input you *need* the two buffers approach: one for `scanf` and one to concatenate the input, regularly resized with `realloc`. Once the return of `scanf` is not 1 (in your case) it is either an error of the end of the input. But if you want to read user input you have to be a bit more careful with `scanf`, see chux's comment under your post for some details. And be aware that `scanf` might be the wrong tool for your purpose! – deamentiaemundi Sep 17 '16 at 13:17
  • I had tried fgets and gets and they have same problem too. :( – ART Sep 17 '16 at 13:25
  • @AnkurTank and for the same reasons with the same solution. Sorry if my writing was a cause of misunderstanding. – deamentiaemundi Sep 17 '16 at 13:42