3

To stack overflow members, I am coding a simulator that requires multiple loops (sometimes over 1,000,000) that each involves heavy calculations. Thus, saving 10 millisecond on a loop can lead to saving over 160 extra minutes of calculations. So I am fighting for every code optimization I can get. Previously I had imported all my system parameters from a file into vectors. After awhile I realized that:

  1. Calling a constant int/double/... is faster than calling a regular int/double/...
  2. Using arrays is faster than using vectors
  3. I did not need the full functionality of std::string and could make a simple knockoff version of it to use

Originally I naively thought I could just convert std::vector<double> to std::vector<const double>. As I've been also trying to make this cross-platform with linux I found that the visual studio express compiler ignores the fact that const can't be declared like that (and linux just throws a ton of errors).

I decided that I would use constant arrays in the structs I was initializing; however, could not figure out how to initialize it properly. (Most/all help forums I came across said to declare static const int* and initialize it as a global, which I don't think works for what I need). I finally developed this final solution; however, I believe I committed a 'taboo' as I ended up using a pointer to declare the constant. It compiles and runs with no errors and thus I was wondering, is it wrong to initialize a constant array using pointers? And if so, how else would you do it? (I understand it might be slower to get all data this way, but in the part where time matters it SHOULD be faster to call... I think)

#include <stdio.h>
#include <malloc.h>
#include <conio.h>

#if defined(_WIN32) || defined(_WIN64)
#define gsize _msize
#else
#define gsize malloc_usable_size
#endif

struct testy{
    const int *i;
    const double *d;
    testy(){
        i=(const int *)malloc(0);
        d=(const double *)malloc(0);
    }
};

inline const int* getDat(const int* dst, int* src){
    dst=(const int*)realloc((void *)dst,gsize((void *)src)); // Allocate enough memory to hold the data
    return src;
}

inline const double* getDat(const double* dst, double* src){
    dst=(const double*)realloc((void *)dst,gsize((void *)src)); // Allocate enough memory to hold the data
    return src;
}

int main(){
    testy data;
    int *tmp_i = (int *)malloc(0);
    double *tmp_d = (double *)malloc(0);

    for(int i=0;i<3;i++){ // load empty array with what i want
        tmp_i=(int*)realloc((void *)tmp_i,gsize((void *)tmp_i)+1*sizeof(int)); // Increase size by one
        tmp_i[i]=i;
    }
    for(int i=0;i<3;i++){ // load empty array with what i want
        tmp_d=(double*)realloc((void *)tmp_d,gsize((void *)tmp_d)+1*sizeof(double)); // Increase size by one
        tmp_d[i]=i;
    }

    data.i=getDat(data.i,tmp_i);
    data.d=getDat(data.d,tmp_d);

    printf("\nIntegers\n");
    for(int i=0;i<(gsize((void *)data.i)/sizeof(int));i++)
        printf("%d\t",data.i[i]);

    printf("\nDoubles\n");
    for(int i=0;i<(gsize((void *)data.d)/sizeof(double));i++)
        printf("%lg\t",data.d[i]);

    free(tmp_i); free(tmp_d);
    _getch();
    return 0;
}

I removed the knockoff string and its uses from this code as there were some issues with declaring the array of it and I didn't want to ask more than one question here.

hherbol
  • 91
  • 12
  • If u want speed you should probably also avoid using heap memory – aaronman Jul 24 '13 at 01:55
  • There's nothing wrong with using heap memory - it's the allocation and deallocation thereof that is typically slower than molasses. Ideally, get all the memory you need from the system allocator before you start iterating. – Casey Jul 24 '13 at 01:59
  • If I don't care about the speed during initialization, but the speed of accessing data, does dynamic allocation still matter? I thought that once finished allocating everything (and assuming I don't reallocate later on, which I shouldn't), a dynamically initialized variable behaved as if it were a regularly initialized variable. – hherbol Jul 24 '13 at 02:00
  • @Casey actually heap can still be [slower](http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap) , however I was more making a joke that the OP has already assumed vector is too slow for his purposes while using another feature that SO constantly criticizes – aaronman Jul 24 '13 at 02:03
  • I'm curious about that. Why does SO criticize manual allocation so much? Don't vectors essentially reallocate memory to add/remove from it? And don't strings do the same when they are assigned? – hherbol Jul 24 '13 at 02:10
  • @aaronman I found nothing in the linked question that demonstrates an inherent difference in the speed of accessing "stack" and "heap" memory. Feel free to exhibit a microbenchmark that demonstrates a difference in the speed of long-term accesses to pre-allocated stack and heap space. – Casey Jul 24 '13 at 03:00
  • @Casey "The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or free. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast" – aaronman Jul 24 '13 at 03:01
  • @aaronman None of which has bearing on long-term access to pre-allocated memory. – Casey Jul 24 '13 at 03:02
  • @Casey I thought the last line related, honestly I wasn't looking for a big argument here but I almost always respond to peoples comments – aaronman Jul 24 '13 at 03:06

1 Answers1

3
  1. Remove realloc. That is expensive and also fragments memory
  2. const will not make any difference - you have to access memory at the end of the day and the const bit just ensures you do not alter things that you shouldn't
  3. Place things like (gsize((void *)data.d)/sizeof(double) outside the for loop if the value is immutable. Prevents a function call each time around the loop
  4. Make the memory cache work. i.e. access stuff in memory sequentially. The data shipped from memory to the processor can be pre-fetched easily.
  5. As to doing any complex operations (that you have not shown) use a little algebra to simplify the equations. If possible use integers over doubles if possible, This depends on the problem space.
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • 1
    So far I agree with 2->5; however, in 1 if I were to remove `realloc` then what do I do to resize the array? I need to make the program dynamically create arrays that can hold all the data from the files. And the files will be changed overtime (adding/removing values). – hherbol Jul 24 '13 at 02:09
  • The compiler can use `const` as a hint for more aggressive optimizations. – jxh Jul 24 '13 at 02:12
  • @hherbol - Have an upper bound? Reduce the number of `realloc`s If using C++ `vector` implement this. As you say you may have a good upper limit to start off with in the first place. Could even just use a stack with a large array (as used in safety critical systems that prevents programmers from using heap allocation). – Ed Heal Jul 24 '13 at 02:13
  • @jxh - I assume that you are thinking the use of `const` as opposed to `#define` – Ed Heal Jul 24 '13 at 02:14
  • @Ed Heal - I'm worried about using a stack with a large array. I'll be simulating particles at a delta distance of angstroms over several microns. I want to save as much memory as possible for later on in the code. But that makes sense on declaring an upper bound `realloc` first and only reallocating if I'm exceeding that. Thank you so much! Now I have to clean up the code and ask another question later on the knockoff string structure I'm planning on using. – hherbol Jul 24 '13 at 02:18
  • @Ed Heal - Oh Wait! I have one last question in regards to your 2nd point. Does that mean it's fine with me initializing the constants by pointers to their memory? Or are you saying the speed benefit doesn't exist? – hherbol Jul 24 '13 at 02:21
  • The `const` bit is for the compilers benefit and yours in programming. The compiler converts your code to machine code. There only exists in terms on the OS a read/write section (stack/heap) and just a read only bit. Either way the processor is accessing memory but protection is involved. This costs nothing. But if you access memory sequentially then the processor can ask the main memory for a good chuck of memory into the processors cache. It can also ask from another chunk whilst processing that chunk. Also try to avoid context switches. – Ed Heal Jul 24 '13 at 02:28
  • Just to see if I got everything right (and likely over-simplifying things to make sense for myself): (1) "access memory sequentially" - ensure that I store all data I need at a time sequentially in memory so that I can grab it all out in a single loop (also, so that it can easily and quickly be retrieved) (2) "Avoid context switches" - After having stored all data I need in memory at location a and location b, try to stick to a first then use b. – hherbol Jul 24 '13 at 02:34
  • @hherbol - Access memory sequentially - i.e. access memory location 0, then 1, ... Avoid context switches - i.e. limit the number of OS calls. If writing/reading from disc (for example) do it in big chuncks. – Ed Heal Jul 24 '13 at 02:41