1

I have a task to print all the prime numbers between 1 and 1000000 in class and the fastest 10 programs get extra marks. The main problem is the time it takes for the prime numbers to be printed to the console.

Basically using the Sieve of Eratosthenes I produce an array with only boolean values in it. The boolean value Numbers[i] is true if i+2 is a prime number.

for(i = 0; i <= n - 2; ++i)
   if (Numbers[i]) // True if the number is prime
        printf("%d\n", i+2);

Printf seems to be really slow as the program can generate the list of primes in about 0.035 s but then takes a further 11 seconds to print the list. Is there anyway I can speed this up, thanks.

Fady
  • 87
  • 1
  • 10
  • 2
    I don't think because if you don't use printf compiler just wipe your calculation because you don't use it. Try to write in a file. – Stargateur Nov 03 '17 at 01:27
  • 3
    stdout is line buffered by default, so every time you hit the new line you will encounter the delay. Try doing `fprintf` to a file instead and see if that's faster. – bruceg Nov 03 '17 at 01:43
  • 1
    also... check out this answer: https://stackoverflow.com/a/11558934/1212725 – bruceg Nov 03 '17 at 01:47
  • 3
    @bruceg: `stdout` is line-buffered *if* it's going to an interactive device. Redirecting the program's output to a file is likely to be faster. Output to the console is probably going to be limited by the speed of the console. – Keith Thompson Nov 03 '17 at 02:07
  • @KeithThompson That's true. But I was guessing that the prof will run it to stdout when awarding the `extra marks`. But, who knows. – bruceg Nov 03 '17 at 02:11
  • @bruceg unlikely, since they will have to check that the primes are correct, which means they need to save the output somewhere (or at least pipe it into `cmp`). – rici Nov 03 '17 at 02:15
  • Printing all numbers (not just primes) from 1 to 1000000 to the Terminal on a modern Mac takes less than one second. Unless you are running on an antiquated system, it is not the `printf` taking that much time. As @Stargateur says, the difference in time is likely due to the compiler optimizing away the entire program when you do not print any output, and the 11 seconds is the actual time your program takes to calculate the primes. – Eric Postpischil Nov 03 '17 at 02:21
  • Try this extreme: `sprintf()` each prime to a _large_ buffer (where the previous left off), effectively concatenating all the strings - and then `fwrite()`. – chux - Reinstate Monica Nov 03 '17 at 02:27
  • 2
    (a) [Show all of your code.](https://stackoverflow.com/help/mcve) (b) Try printing only the last prime found. This will prevent the compiler from optimizing away your code, as it does when there is no `printf` in the program. I predict you will find the program is still slow even when it has only this one `printf` in it, and that proves it is the prime-computation that is taking the time, not the `printf`. – Eric Postpischil Nov 03 '17 at 02:31
  • I just wrote and ran a simple sieve as described by the OP except I printed each prime 100 times. It ran in 11.52 seconds on a MacBook Pro with a 2.9 GHz Intel Core i7, printing to the Terminal with scrolling. Unless the OP is running on a system 100 times slower or is printing to an antique terminal, it is not the `printf` taking the time. It is the computation of primes. – Eric Postpischil Nov 03 '17 at 02:42
  • At first place you could optimize the loop to run half the number of times. Use i+=2 in the loop. Then you could go onto debugging the responsiveness of `print` statement. Try concatenating all the numbers in a string to start with. See if that makes any difference. – Rajeev Ranjan Nov 03 '17 at 02:44
  • 1
    My program takes 35ms to calculate the primes and 431ms to print them in Windows. – Retired Ninja Nov 03 '17 at 03:03
  • converting and printing 78498 native integers should not last more than a couple of milliseconds, as everyone here suggests, so please post the whole code here (or let the folk at https://codereview.stackexchange.com/ take a look first, which I would recommend) or risk the question to be closed as not repeatable. – deamentiaemundi Nov 03 '17 at 03:35
  • I've never seen a programming contest that reads output on screen. All inputs and outputs are done in files (or at least redirect stdin/stdout to files) – phuclv Nov 03 '17 at 04:51
  • 1
    I just ran a test program that prints the primes up to 1000000 using `printf` calls of the form `int i2 = 2; printf("%d\n", i2);` (the compiler didn't optimize away the `%d` conversion). It ran in 0.016s redirecting output to a file, 1.084s sending output to an xterm window. That's on a 3.4 GHz x86_64 running Ubuntu. – Keith Thompson Nov 03 '17 at 17:50
  • We definitely have to print the output. I tried my program on a linux Ubuntu which is what the prof will use to test the speeds and it takes 0.388 seconds with printing so I'm thinking its something to do with windows device somehow??? (3.5 GHz AMD 8 core processor) – Fady Nov 04 '17 at 15:45
  • @Fady Now, *that's* interesting! You should add this information to the top of your question and maybe even change the title to something like e.g.: "Why is `printf` under Windows slower by two magnitudes?" and add a `Windows` tag. I am not a Windows specialist myself but there are many here and that kind of behavior is curious to say the least. – deamentiaemundi Nov 04 '17 at 18:46

2 Answers2

2

Since by default console output is line buffered, which is the reason of the increased time. You can use the setvbuf function to allow printing to console/stdout only in chunks rather than for each iteration.

E.g.

char buffer[256];
setvbuf(stdout, buffer, _IOFBF, sizeof(buffer));

You can alter the size of buffer according to your needs.

IOFBF option is for full buffering i.e. output will be printed once the buffer is full.

See setvbuf for more details

tryingToLearn
  • 10,691
  • 12
  • 80
  • 114
  • 1
    I predict the `printf` will not be the cause of the execution time, in which case this answer will be voted down. Likely it will be discovered OP’s program is inefficient. – Eric Postpischil Nov 03 '17 at 02:23
  • Going by the numbers OP mentioned, the program can generate the list of primes in about 0.035 s but then takes a further 11 seconds to print the list. Also IMHO, in case of long outputs, time is consumed in scrolling(by the system) – tryingToLearn Nov 03 '17 at 02:26
  • 1
    The reason the program runs quickly without `printf` but slowly with it has been explained: The compiler optimizes away the program. – Eric Postpischil Nov 03 '17 at 02:28
  • @EricPostpischil Thanks. I missed that. – tryingToLearn Nov 03 '17 at 02:33
  • 1
    @EricPostpischil Yes but the title of the question is "How to speed up printf in C" which in itself is an important and crystal-clear question, so lots of people come here for an answer to *that* question. – personal_cloud Jul 24 '20 at 07:42
2

Beneath is a slightly unoptimized implementation (although I skipped the intermediate list and print directly) of what I think you were supposed to do. Running that program on an AMD A8-6600K with a small load (mainly a Youtube music-video for some personal entertainment) results in

real    0m1.211s
user    0m0.047s
sys     0m0.122s

averaged over a couple of runs. So the problem lies in your implementation of the sieve or you are hiding some essential facts about your hardware.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <limits.h>
#include <string.h>

/* I call it a general bitset. Others might call it an abomination. YMMV. */
#   define ERAT_BITS (sizeof(uint32_t)*CHAR_BIT)
#   define GET_BIT(s,n)  ((*(s+(n/ERAT_BITS)) &   ( 1<<( n % ERAT_BITS ))) != 0)
#   define SET_BIT(s,n)   (*(s+(n/ERAT_BITS)) |=  ( 1<<( n % ERAT_BITS )))
#   define CLEAR_BIT(s,n) (*(s+(n/ERAT_BITS)) &= ~( 1<<( n % ERAT_BITS )))
#   define TOG_BIT(s,n)   (*(s+(n/ERAT_BITS)) ^=  ( 1<<( n % ERAT_BITS )))
/* size is the size in bits, the overall size might be bigger */
typedef struct mp_bitset_t {
    uint32_t size;
    uint32_t *content;
} mp_bitset_t;
#   define mp_bitset_alloc(bst, n) \
  do {\
      (bst)->content=malloc(( n /(sizeof(uint32_t)) + 1 ));\
      if ((bst)->content == NULL) {\
          fprintf(stderr, "memory allocation for bitset failed");\
          exit(EXIT_FAILURE);\
        }\
      (bst)->size = n;\
  } while (0)
#   define mp_bitset_size(bst)  ((bst)->size)
#   define mp_bitset_setall(bst) memset((bst)->content,~(uint32_t)(0),\
   (bst->size /(sizeof(uint32_t) ) +1 ))
#   define mp_bitset_clearall(bst) memset((bst)->content,0,\
   (bst->size /(sizeof(uint32_t) ) +1 ))
#   define mp_bitset_clear(bst,n) CLEAR_BIT((bst)->content, n)
#   define mp_bitset_set(bst,n)     SET_BIT((bst)->content, n)
#   define mp_bitset_get(bst,n)     GET_BIT((bst)->content, n)
#   define mp_bitset_free(bst) \
  do {\
     free((bst)->content);\
     free(bst);\
  } while (0)

uint32_t mp_bitset_nextset(mp_bitset_t * bst, uint32_t n);
uint32_t mp_bitset_prevset(mp_bitset_t * bst, uint32_t n);
void mp_eratosthenes(mp_bitset_t * bst);


/* It's called Hallek's method but it has many inventors*/
static uint32_t isqrt(uint32_t n)
{
   uint32_t s, rem, root;
   if (n < 1)
      return 0;
   /* This is actually the highest square but it goes
    * downward from this, quite fast */
   s = 1 << 30;
   rem = n;
   root = 0;
   while (s > 0) {
      if (rem >= (s | root)) {
         rem -= (s | root);
         root >>= 1;
         root |= s;
      } else {
         root >>= 1;
      }
      s >>= 2;
   }
   return root;
}

uint32_t mp_bitset_nextset(mp_bitset_t *bst, uint32_t n)
{
   while ((n < mp_bitset_size(bst)) && (!mp_bitset_get(bst, n))) {
      n++;
   }
   return n;
}

/*
 * Standard method, quite antique now, but good enough for the handful
 * of primes needed here.
 */
void mp_eratosthenes(mp_bitset_t *bst)
{
   uint32_t n, k, r, j;

   mp_bitset_setall(bst);
   mp_bitset_clear(bst, 0);
   mp_bitset_clear(bst, 1);

   n = mp_bitset_size(bst);
   r = isqrt(n);
   for (k = 4; k < n; k += 2)
      mp_bitset_clear(bst, k);
   k = 0;
   while ((k = mp_bitset_nextset(bst, k + 1)) < n) {
      if (k > r) {
         break;
      }
      for (j = k * k; j < n; j += k * 2) {
         mp_bitset_clear(bst, j);
      }
   }
}

#define UPPER_LIMIT 1000000 /* one million */

int main(void) {
  mp_bitset_t *bst;
  uint32_t n, k, j;

  bst = malloc(sizeof(mp_bitset_t));
  if(bst == NULL) {
    fprintf(stderr, "failed to allocate %zu bytes\n",sizeof(mp_bitset_t));
    exit(EXIT_FAILURE);
  }
  mp_bitset_alloc(bst, UPPER_LIMIT);

  mp_bitset_setall(bst);
  mp_bitset_clear(bst, 0);      // 0 is not prime b.d.
  mp_bitset_clear(bst, 1);      // 1 is not prime b.d.

  n = mp_bitset_size(bst);
  for (k = 4; k < n; k += 2) {
    mp_bitset_clear(bst, k);
  }
  k = 0;

  while ((k = mp_bitset_nextset(bst, k + 1)) < n) {
    printf("%" PRIu32 "\n", k);
    for (j = k * k; j < n; j += k * 2) {
      mp_bitset_clear(bst, j);
    }
  }
  mp_bitset_free(bst);

  return EXIT_SUCCESS;
}

Compiled with

gcc-4.9 -O3 -g3 -W -Wall -Wextra -Wuninitialized -Wstrict-aliasing -pedantic  -std=c11 tests.c -o tests

(GCC is gcc-4.9.real (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4)

deamentiaemundi
  • 5,502
  • 2
  • 12
  • 20
  • What's with the `do{...}while(0)` blocks? Looks like code noise to me. – jwdonahue Nov 03 '17 at 07:10
  • 2
    You really shouldn't be doing the OP's homework assignment for them. – jwdonahue Nov 03 '17 at 07:22
  • 1
    @jwdonahue: The `do { ... } while(0)` idiom is how you write a macro that can be used in any context where a statement can appear. See the [comp.lang.c FAQ](http://www.c-faq.com/), question 10.4. – Keith Thompson Nov 03 '17 at 17:40
  • @jwdonahue I'm pretty sure *that* code is not suitable as a solution for the homework of a beginner and, yes, it was done with that goal in mind. It was meant as a proof that it's not `printf`but OP's implementation of the sieve. There are already 2 close votes as I write this. I think I'll give OP a couple of hours to come up with their code (Earth is a sphere) and add my close vote otherwise if still necessary. – deamentiaemundi Nov 03 '17 at 19:40
  • 1
    @jwdonahue [`do { … } while (0)` — what is it good for?](https://stackoverflow.com/q/257418/995714) – phuclv Nov 04 '17 at 12:12