C++ IO optimization and string handling for stdin

Question

I am writing a program for a competition that uses "standard stdin procedures" using C++. the first line that is taken in indicates how many lines (x) there are to be expected as input from then on. These lines of input can be strings, integers, or some combination of the two, and every line contains exactly two elements separated by a space. At the moment, I am taking in each line one at a time (handling the info before asking for the next line) in a manner similar to this:

  string initial;
  getline (cin,initial);
  istringstream stringStream (initial);
  vector<string> parsedString;
  vector<int> data;
  char splitToken = ' ';
  while ( !stringStream.eof() )
  {
    string subString;
    getline( stringStream, subString, splitToken);
    parsedString.push_back( subString );
  }

  for (int i = 0; i <parsedString.size(); i++)
  {
    string temp = parsedString[i];
    int intTemp = atoi(temp.c_str());
    data.push_back(intTemp);
  }
  unsigned int n = data[0];
  unsigned int m = data[1];

In this specific case I know the incoming data will be two integers, but that is not always the case. I was wondering if there was some way to make my code faster, either by changing my approach (perhaps taking all the input lines at once after knowing how many to expect) or by using better built in C++ functions to split the incoming lines at the space into the two elements that compose them.

Thanks

Facebook hacker cup time again :-) Aren't *you* supposed to solve this? (I think I won't point out any `eof()` related issues coming weekend!) — Kerrek SB, Jan 24 '13 at 00:07
You may find this link useful: http://stackoverflow.com/questions/1042110/using-scanf-in-c-programs-is-faster-than-using-cin — sdkljhdf hda, Jan 24 '13 at 00:11
`data.push_back(atoi(parsedString[i].c_str()));` is an all in one special. — ChiefTwoPencils, Jan 24 '13 at 00:52

score 1 · Answer 1 · answered Jan 24 '13 at 00:10

1

Usually, the idiom for reading input in C++ is more along these lines:

std::ios_base::sync_with_stdio(false); //tell iostreams to be fast
int number_of_lines;
std::cin >> number_of_lines;
if (!std::cin) *ERROR*;

for( ; number_of_lines>0; --number_of_lines) {
    unsigned n, m;
    std::cin >> n >> m;
    if (!std::cin) *ERROR*;
    //process n and m here
}

You say "These lines of input can be strings, integers, or some combination of the two", but how will you know which are which? The above code assumes everything is a number, because if you don't know the type, the input is useless.

answered Jan 24 '13 at 00:10

Mooing Duck

64,318
19
100
158

I will know on a case by case basis, thats not really important for my oveall question. Thanks for the answer! My one question is, is there anything faster than cin? – user1998665 Jan 24 '13 at 00:35
@user1998665: yes, `scanf` can be used in a nearly similar way if speed is absolutely needed, sometimes it's ever so slightly faster: http://ideone.com/e8PWn2 – Mooing Duck Jan 24 '13 at 00:41
@ildjarn: As per your comment on the other answer, I put that in my answer 35 minutes ago. Did I do it wrong? (wouldn't be surprising) – Mooing Duck Jan 24 '13 at 00:51
1

@user1998665: be super careful with scanf though, one typo and it will destroy everything with no warning whatsoever. `cin` is totally safe, and gives warnings if you make mistakes. – Mooing Duck Jan 24 '13 at 00:52
@MooingDuck : No, I just failed to read your code, my fault. :-] – ildjarn Jan 24 '13 at 00:54

zaufi · Answer 2 · 2013-01-24T00:38:01.173

ok, if you'd like to talk about other approach, here is another one: the big (performance) problem of almost any code which works heavy with strings, that there is a lot of copy operations... a naive approach (used by beginners) is to make a lot of temporary strings when find/parse/extract smth from strings or memory buffers...

alternative way (when the lifetime of source buffer allows) is to find, parse or extract data from buffer w/o copy any particular part of the source string. to do so, one may use boost::range library and boost::iterator_range class (in conjunction with boost string_algo). That way makes possible to do usual find/parse tasks and avoid copying of partial strings.

One example from my experience: some time ago (for performance testing) my colleague wrote 3 versions of a same config parser (config file was about few megabytes):

using std::string::find
using boost::tokenizer
and boost::iterator_range

results was surprising:

with -O0 optimization std::string::find wins (about 3 times over boost::tokenizer and about 5 times over iterator_range based parser)
but at -O3 everything has changed: boost::iterator_range was the absolute winner! (about 12 times over std::string::find!)

results are quite predictable: highly templated code of boost::string_algo and friends were highly inilined as well...

it is why I personaly like to use boost::iterator_range to parse strings...

so I'd recommend to read everything (as much as you can) into one std::string, std::vector<char> (better one, cuz it have memory contiguity guarantee, so you may read() from stream directly into a container) or std::deque<char>, and then use boost::string_algo and boost::iterator_range to parse it trying to avoid any (useless) copying from source string to temporary locations... also try to use boost::lexical_cast to convert numbers (or your own ranges aware numeric converter)

one more advise: try to avoid memory (often) allocations -- it is really heavy operation. use rserve() member for containers when you have a clue about possible size of data yu'd like to store. and for final, try to play with custom (tricky) allocators instead of (usually stupid) default one.

note that in C++11 std::string have memory contiguity guarantee as std::vector.

Good sir, you are eons ahead of me in terms of programming abilities. I applaud your dedication. *Standing Ovation* — user1998665, Jan 24 '13 at 00:47

score 0 · Answer 3 · answered Jan 24 '13 at 00:11

On many systems (including GCC on Linux), C++ I/O streams are very slow. If speed is your primary concern, use <stdio.h>, e.g. scanf() and (even faster:) getchar(), and do the parsing manually, like in the following state machine:

int c;
int state = 0;
while ((c = getchar()) >= 0) {
  switch (c) {
   case ' ': case '\t':
    ...;
    ... if (state == ...) { ...; state = ...; }
    break;
   case '\n':
    ...;
    break;
   default:
    ...;
  }
}

Copy the data as few times as possible. (For example, in the question string temp = ... does an unnecessary copy, which can be eliminated by using a reference: string &temp = ....) If possible, use the state to figure out the final place of the next character c, and put it right there, and don't copy or move it in the future.

I'd wager that if you called `std::ios_base::sync_with_stdio(false)` first you'd find that standard IO streams perform just fine (for most purposes). — ildjarn, Jan 24 '13 at 00:14

C++ IO optimization and string handling for stdin

3 Answers3