7

I am trying to read in a string of length in 10^5 order. I get incorrect string if the size of string grows beyond 4096. I am using the following code

string a;
cin>>a;

This didn't work then I tried reading character by character by following code

unsigned char c;
vector<unsigned char> a;
while(count>0){
 c = getchar();
 a.push_back(c);
 count--;
}

I have done necessary escaping for using getchar this also had the 4096 bytes problem. Can someone suggest a workaround or point to correct way of reading it.

Baruntar
  • 239
  • 2
  • 6
  • What C runtime are you using? It seems strange that `cin` wouldn't cope with "any size string" - but I must admit I haven't tried. – Mats Petersson Apr 05 '14 at 20:09
  • 3
    Are you reading from the terminal? The terminal input buffer has limited capacity. – Brian Bi Apr 05 '14 at 20:10
  • 1
    Are you copy-pasting into a terminal? It could be the terminal or shell program's fault. – Potatoswatter Apr 05 '14 at 20:10
  • 10^5 order? Damm. Is it something like 'type your favorite Shakespeare work and press enter'? Joking aside, What is your `stdin`? It's normally the console, which normally doesn't even have a buffer of that size, hence, you can't read that much from it. – Kahler Apr 05 '14 at 20:11

3 Answers3

5

It is because your terminal inputs are buffered in the I/O queue of the kernel.

Input and output queues of a terminal device implement a form of buffering within the kernel independent of the buffering implemented by I/O streams.

The terminal input queue is also sometimes referred to as its typeahead buffer. It holds the characters that have been received from the terminal but not yet read by any process.

The size of the input queue is described by the MAX_INPUT and _POSIX_MAX_INPUT parameters;

By default, your terminal is in Canonical mode.

In canonical mode, all input stays in the queue until a newline character is received, so the terminal input queue can fill up when you type a very long line.


We can change the input mode of terminal from canonical mode to non-canonical mode.

You can do it from terminal:

$ stty -icanon (change the input mode to non-canonical)
$ ./a.out (run your program)
$ stty icanon (change it back to canonical)

Or you can also do it programatically,

To change the input mode programatically we have to use low level terminal interface.

So you can do something like:

#include <iostream>
#include <string>
#include <stdio.h>
#include <termios.h> 
#include <unistd.h>

int clear_icanon(void)
{
  struct termios settings;
  int result;
  result = tcgetattr (STDIN_FILENO, &settings);
  if (result < 0)
    {
      perror ("error in tcgetattr");
      return 0;
    }

  settings.c_lflag &= ~ICANON;

  result = tcsetattr (STDIN_FILENO, TCSANOW, &settings);
  if (result < 0)
    {
      perror ("error in tcsetattr");
      return 0;
   }
  return 1;
}


int main()
{
    clear_icanon(); // Changes terminal from canonical mode to non canonical mode.

    std::string a;

    std::cin >> a;

    std::cout << a.length() << std::endl;
}
Community
  • 1
  • 1
Raman
  • 2,735
  • 1
  • 26
  • 46
4

Using this test-program based on what you posted:

#include <iostream>
#include <string>


int main()
{
    std::string a;

    std::cin >> a;

    std::cout << a.length() << std::endl;
}

I can do:

./a.out < fact100000.txt

and get the output:

456574

However, if I copy'n'paste from an editor to the console, it stops at 4095. I expect that's a limit somewhere in the consoles copy'n'paste handling. The easy solution to that is of course to not use copy'n'paste, but redirect from a file. On some other systems, the restruction to 4KB of input may of course reside somewhere else. (Note that, at least on my system, I can happily copy and paste the 450KB of factorial result to another editor window, so in my system it's simply the console buffer that is the problem).

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • I will try your solution and let you know – Baruntar Apr 05 '14 at 20:33
  • Given your comment ot the other answer, and the fact that I'm also using Fedora (although a now ancient Fedora 16), it's likely that you simply can't type more than 4K into the terminal input. Remember that terminal input is "cooked", in other words, it's processed before given to the actual program, so there HAS to be a limit to the amount of input you can give. – Mats Petersson Apr 05 '14 at 20:37
  • Windows / VS shows max. 4094 for long word and even when there are spaces and you have to read all separate tokens results are similar. – Jan Sep 22 '21 at 12:15
2

This is much more likely to be a platform/OS problem than a C++ problem. What OS are you using, and what method are you using to get the string fed to stdin? It's pretty common for command-line arguments to be capped at a certain size.

In particular, given that you've tried reading one character at a time, and it still didn't work, this seems like a problem with getting the string to the program, rather than a C++ issue.

Mark Bessey
  • 19,598
  • 4
  • 47
  • 69
  • I am using Linux, Fedora 20 64. I am not using any command line arguments. I am simply typing my input to Bash terminal after running my executable, no redirection. Its not string problem as in the code snippet I used vector. I will try to post what I entered and what I got in my string and vectors, latter. – Baruntar Apr 05 '14 at 20:24