1

I created a program in C that writes a program as its output.

The objective is to test aspects of performance in monolithic programs.

The first test was configured with 10 000 iterations and the result program compiled and ran. The second test with 100 000 iterations is compiling (3030 minutes until now) in Ubuntu 12.04 x86_64 in an i7 3770 with 16 GB RAM (16 GB SWAP).

I know that a parse complexity is from O(n**2) to O(n**3), but this is taking too long. In the worst scenario, 1000 times more compile timing.

Is consuming 35,2% of memory and still increases.

My question is:

  • Does GCC have limitations in the number of variables per module or module size?

  • Is this a bug?

The original program generator is:

#include <stdio.h>
#define MAX_INTERACTION 100000

int main(int argc, char **argv)
{
FILE * fp;
fp = fopen("source.c","w");

fprintf(fp,"#include <stdio.h> \n \n \n");
fprintf(fp,"int main(int argc, char **argv) \n");
fprintf(fp,"{ \n");

// local variables and exchange variables
for (int i=0; i< MAX_INTERACTION ; ++i)
{
    // passed variable, return label , local variable
    fprintf(fp," int pv%d , rl%d, loc%d ; \n",i,i,i); 
}

fprintf(fp," int pvd =0 ;\n \n \n");

//code blocks
for (int i=0; i< MAX_INTERACTION ; ++i)
{
    fprintf(fp," block%d : \n",i);
    fprintf(fp," loc%d = pv%d +1 ; \n",i,i);
    fprintf(fp," goto  rl%d; \n",i);
}

//call blocks
for (int i=1; i< MAX_INTERACTION +1; ++i)
{
    fprintf(fp," pvd = pv%d ;\n",(i-1));
    fprintf(fp," goto block%d; \n",(i-1));
    fprintf(fp," rl%d: \n",(i-1));
}

fprintf (fp,"printf( \"Concluido \\n \"); \n");
fprintf(fp,"}\n");

fclose(fp);
}
Ivan Velho
  • 13
  • 4
  • 3
    So how long did the first program take to compile? That should give you some idea of how long the second one should take. What command are you using to compile? Any optimization flags? – dave Aug 15 '12 at 01:11
  • 1
    You should make the number of iterations into a parameter of the generator, so you don't have to recompile in order to change the size of generated program. – Jonathan Leffler Aug 15 '12 at 01:20
  • Indeed. I would follow @JonathanLeffler's suggestion and then generate programs at size 5000, 10000, 15000, 20000, 25000... and time the compiles; you'll get an idea of the functional form. If it hits the wall at a certain size, you'll have learned something -- or you might just see it smoothly scaling up to the stratosphere. – Ernest Friedman-Hill Aug 15 '12 at 01:22
  • When I compile the generated code, I get a lot of warnings about `source.c:9: warning: unused variable ‘rl3’`. So, the code generated leaves something to be desired. The arguments to the generated `main()` are not used; you should print `int main(void)` instead. – Jonathan Leffler Aug 15 '12 at 01:33
  • The first program compiled in seconds. I not counted, but was a short time. – Ivan Velho Aug 15 '12 at 01:37
  • The command used was gcc -c, since it is a very first test. Yes , several variables are unused now. But they will be when the performance test is complete. I will try with few iterations – Ivan Velho Aug 15 '12 at 01:39
  • The complete command was gcc -c -std=c99 – Ivan Velho Aug 15 '12 at 01:45
  • 10000 -> 1:27 minutes 15000 -> 6 minutes – Ivan Velho Aug 15 '12 at 02:01

1 Answers1

1

I did some timing on a MacBook Pro with 8 GB main memory (2.3 GHz Intel Core i7).

Modified the generator to take a parameter indicating the program size, and then ran it repeatedly:

$ for size in 10 100 1000 2000 3000 4000 5000 10000 20000 30000
> do
>     ./generator $size
>     echo $size
>     time make -s source 2>/dev/null
>     sleep 1
> done
10

real    0m0.526s
user    0m0.030s
sys     0m0.029s
100

real    0m0.084s
user    0m0.031s
sys     0m0.018s
1000

real    0m0.333s
user    0m0.235s
sys     0m0.044s
2000

real    0m0.392s
user    0m0.318s
sys     0m0.046s
3000

real    0m0.786s
user    0m0.661s
sys     0m0.070s
4000

real    0m0.657s
user    0m0.599s
sys     0m0.053s
5000

real    0m0.978s
user    0m0.893s
sys     0m0.069s
10000

real    0m3.063s
user    0m2.770s
sys     0m0.149s
20000

real    0m8.109s
user    0m7.315s
sys     0m0.274s
30000

real    0m21.410s
user    0m19.553s
sys     0m0.483s

$

Clearly, at the small sizes, there is overhead in simply getting the compiler run (especially the first run!), reading and writing files, etc. Redoing in multiples of 10,000 up to 90,000, I got the results in the table below. The slowdown is appreciable, especially at between 20,000 and 30,000. I also got fairly significant variability in the times for any given size. And making the code configurable and then running incrementally when you run into a problem is only sensible.

                    Compared to 10K
Size       Time     Size Ratio   Time Ratio  Log Time Ratio
 10K        3.7      1.00          1.00       0.000
 20K        8.1      2.00          2.19       0.784
 30K       25.1      3.00          6.78       1.915
 40K       45.2      4.00         12.22       2.503
 50K       76.7      5.00         20.73       3.032
 60K      110.5      6.00         29.96       3.397
 70K      176.0      7.00         47.57       3.862
 80K      212.0      8.00         57.30       4.048
 90K      292.3      9.00         79.00       4.369
100K      363.3     10.00         98.19       4.587

Your mileage will vary, of course.

For reference, the GCC I'm using is:

i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00)

The compilation command line is:

/usr/bin/gcc -O3 -g -std=c99 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
       -Wold-style-definition source.c -o source 

A home-built GCC 4.7.1 took 34.6 s for 10K, and 150.9 s for 20K.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Seems that GCC 4.7.1 works better with large modules and the problem was the compiler behavior. I will install 4.7.1. – Ivan Velho Aug 15 '12 at 02:20
  • GCC 4.7.1 took 10-20 times as long as the LLVM compiler for the two points I measured. The LLVM compiler is swifter. I didn't compare the code, though -- no program sizes. Separately, I think your generated code uses some uninitialized variables, so it doesn't produce a deterministic result. There also seems to be some dead code in the middle, between the `block` labels and the `rl` labels. I'm not getting the compiler complaining about either of those issues, though. – Jonathan Leffler Aug 15 '12 at 02:34
  • After install 4.7.1, I will clean the code and start real tests. This was a first try. – Ivan Velho Aug 15 '12 at 02:41
  • With the -O3 and a GCC 4.7.1 optimized for corei7-avx, the compile time for 100K was 2:07, but code was reduced by the compiler (I inspected using -S option). I need the code exactly as it was in the C. A direct translation to assembler. A workaround is write directly the .s file. – Ivan Velho Aug 15 '12 at 14:11
  • @IvanVelho: maybe you should investigate what happens without `-O3`. To the extent there is an answer to the calculation in the program, I believe that the optimizing compiler could effectively reduce the generated code to a single assignment if the variables are formally initialized, eliminating almost all the variables. I guess it depends on whether you're looking at how slabs of arcane machine code behave, or at how C compilers optimize slabs of arcane C code. – Jonathan Leffler Aug 15 '12 at 14:32
  • Jonathan: I investigated this. The only compilation without any optimization in the branches occurs suppressing -O. What is exactly what I did at first. I need the code translation as is. Other option is create variable operations and operands to force the compiler to preserve the branches even with -O – Ivan Velho Aug 15 '12 at 22:28