1

I currently use a loop with scanf("%d", &value), but I would need it to go faster. The amount of data can be as much as 2 000 000 values. Is there any way to speed this up? I read about strtok and strtol, but I do not know how to use them and if they even would achieve the speed up I need.

Simon
  • 43
  • 5
  • 1
    You should use `strtol` not because it's faster, but because, unlike `scanf`, it will tell you when you hit numeric overflow or invalid input. (What does your program do when you feed it `123cheesesandwich`? It crashes, doesn't it? See.) – zwol May 26 '14 at 01:25
  • 4
    I don't really get the hostility here. It sounds like he's asking how to use `strtol`, which is valid because it requires buffering the file manually to some degree. (`strtok` is totally unrelated; you don't need that.) – Potatoswatter May 26 '14 at 01:38
  • 1
    @Zack why would that make it crash? – M.M May 26 '14 at 06:18
  • scanf is pretty much as fast as you can get for parsing ints from strings. 2M values should be read in a fraction of a sec. Please post some code for more details. – vz0 May 26 '14 at 06:53
  • @MattMcNabb It's not guaranteed to crash, but the way `scanf` works, you can easily get stuck in an infinite loop or worse on invalid input. – zwol May 26 '14 at 13:48
  • 1
    @Zack you only get stuck in an infinite loop if you wrote an infinite loop in your code. scanf's behaviour on 123cheesesandwich is well-defined. – M.M May 26 '14 at 21:09
  • @Simon Request to accept any of the answers to close the question. – Anmol Singh Jaggi Mar 25 '16 at 11:40

2 Answers2

7

If you want only speed and no error-checking, you can make your own function for taking an input and parsing it as an integer using getchar().

void fast_input(int* int_input)
{
    *int_input=0;
    char next_char=0;
    while( next_char < '0' || next_char > '9' ) // Skip non-digits
        next_char = getchar();
    while( next_char >= '0' && next_char <= '9' )
    {
        (*int_input) = ((*int_input)<<1) + ((*int_input)<<3) + next_char - '0';
        next_char = getchar();
    }
}

int main()
{
    int x;
    fast_input(&x);
    printf("%d\n",x);
}  
Anmol Singh Jaggi
  • 8,376
  • 4
  • 36
  • 77
  • 3
    If there is no worry of locking and the platform is posix compliant, one can use `getchar_unlocked`. Magic numbers like 47, 48, 57, 58 [look bad](http://stackoverflow.com/questions/47882/what-is-a-magic-number-and-why-is-it-bad). Better replace with '0', '9' etc. Also solution would work for ascii input. – Mohit Jain May 26 '14 at 07:07
  • Also making `fast_input` inline can give you some speed if compiler honour the inline request. – Mohit Jain May 26 '14 at 07:09
  • @MohitJain and what is going to magically make the IO faster when inlined? – sehe May 26 '14 at 07:14
  • @sehe It won't make IO faster, but benchmark results may turn better as the function call cost might be saved. Moreover there is no harm in inlining such utility. – Mohit Jain May 26 '14 at 07:38
  • @MohitJain Firstly, "better" is very unclear (did the OP say he wants to optimize for CPU/Power usage?). In _principle_ there is harm in blindly applying micro-optimizations. In this particular case, though I will agree. For the simple reason that there will likely not be a difference at all because the compiler will inline that function on your behalf, and may simply ignore your `inline` suggestion just the same. It can even do this across translation units (most compilers are capable of LTO these days) – sehe May 26 '14 at 07:47
  • using 48 is harder to realize than '0' – phuclv May 26 '14 at 07:56
  • Edited to replace ASCII codes with their character equivalents. – Anmol Singh Jaggi May 26 '14 at 13:13
5

According to my experiences, memory mapped access is much faster for reading large amount of content from a file.

This can be achieved by

   #include <sys/mman.h>
   void *mmap(void *addr, size_t length, int prot, int flags,
              int fd, off_t offset);
   int munmap(void *addr, size_t length);

... on *Nix and some combination of

 CreateFileMapping
 OpenFileMapping
 MapViewOfFile
 MapViewOfFileEx
 UnmapViewOfFile
 FlushViewOfFile
 CloseHandle

... on Windows (refer to the link here.

Basically you want something like:

int fd = open( "filename" , 0 );
char* ptr = mmap( 0 , 4096*1024 // MAX FILE SIZE
         , PROT_WRITE | PROT_READ , MAP_PRIVATE , fd , 0 //offset
 );
// NOW READ AS IF ptr IS THE HEAD OF SOME STRING
char * thisp = ptr ;
while ( thisp != ptr+4096*1024 && *thisp ){
      int some_int_you_want = strtol( thisp , &thisp , 10 );
}
munmap(ptr,4096*1024);

I'm not very confident that the code above is correct but it should have the correct idea....

phoeagon
  • 2,080
  • 17
  • 20
  • Though memory mapping [doesn't always generate the highest speeds](http://stackoverflow.com/questions/17925051/fast-textfile-reading-in-c/17925143#17925143), also see [How to parse space-separated floats in C++ quickly?](http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702#17479702) – sehe May 26 '14 at 07:51