1

I am trying to improve performance of my program (running on ARC platform, compiled with arc-gcc. Having said that, I am NOT expecting a platform specific answer).

I want to know which of the following methods is more optimal and why.

typedef struct _MY_STRUCT
{
    int my_height;
    int my_weight;
    char my_data_buffer[1024];
}MY_STRUCT;

int some_function(MY_STRUCT *px_my_struct)
{
    /*Many operations with the structure members done here*/
    return 0;
}

void poorly_performing_function_method_1()
{
    while(1)
    {
        MY_STRUCT x_struct_instance = {0}; /*x_struct_instance is automatic variable under WHILE LOOP SCOPE*/
        x_struct_instance.my_height = rand();
        x_struct_instance.my_weight = rand();
        if(x_struct_instance.my_weight > 100)
        {
            memcpy(&(x_struct_instance.my_data_buffer),"this is just an example string, there could be some binary data here.",sizeof(x_struct_instance.my_data_buffer));
        }
        some_function(&x_struct_instance);

        /******************************************************/
        /* No need for memset as it is initialized before use.*/
        /* memset(&x_struct_instance,0,sizeof(x_struct_instance));*/
        /******************************************************/
    }
}

void poorly_performing_function_method_2()
{
    MY_STRUCT x_struct_instance = {0}; /*x_struct_instance is automatic variable under FUNCTION SCOPE*/
    while(1)
    {
        x_struct_instance.my_height = rand();
        x_struct_instance.my_weight = rand();
        if(x_struct_instance.my_weight > 100)
        {
            memcpy(&(x_struct_instance.my_data_buffer),"this is just an example string, there could be some binary data here.",sizeof(x_struct_instance.my_data_buffer));
        }
        some_function(&x_struct_instance);
        memset(&x_struct_instance,0,sizeof(x_struct_instance));
    }
}

In the above code, will poorly_performing_function_method_1() perform better or will poorly_performing_function_method_2() perform better? Why?

Few things to think about..

  • In method #1, can deallocation, reallocation of structure memory add more overhead?
  • In method #1, during initialization, is there any optimization happening? Like calloc (Optimistic memory allocation and allocating memory in zero filled pages)?

I want to clarify that my question is more about WHICH method is more optimal and less about HOW to make this code more optimal. This code is just an example.

About making the above code more optimal, @Skizz has given the right answer.

Community
  • 1
  • 1
CCoder
  • 2,305
  • 19
  • 41
  • I'm not familiar with ARC - is there anything platform-specific that would prevent you from benchmarking it? – us2012 Oct 01 '13 at 15:21
  • @us2012 Updated the question. No I am not looking at platform specific answer. Just mentioned the platform for the sake of completion. – CCoder Oct 01 '13 at 15:22
  • the function which uses memset(), performance will be degraded , because memset is high cost memory operation – Sohil Omer Oct 01 '13 at 15:23
  • @CCoder: have you **profiled it**? – nneonneo Oct 01 '13 at 15:23
  • @nneonneo Yes. I have profiled it using oprofile and I see memset coming up in the list for method 2. But I am not sure if method 1 is giving a better overall performance as I have a lot of other code running during profiling. – CCoder Oct 01 '13 at 15:27
  • I think @Skizz has it exactly right: to clear the string, you only need to write a '\0' in byte 0, not overwrite all 1024 bytes. Given how little else this function does, this alone should multiply the speed of the function by a couple orders of magnitude (writing ~9 bytes per iteration instead of ~1032 bytes). Depending on cache pressure, it could help even more than that by reducing cache usage. – Jerry Coffin Oct 01 '13 at 15:33

1 Answers1

3

Generally, not doing something is going to be faster than doing something.

In your code, you're clearing a structure, and then initialising it with data. You're doing two memory writes, the second is just overwriting the first.

Try this:-

void function_to_try()
{
  MY_STRUCT x_struct_instance;
  while(1)
  {
    x_struct_instance.my_height = rand();
    x_struct_instance.my_weight = rand();
    x_struct_instance.my_name[0]='\0';
    if(x_struct_instance.my_weight > 100)
    {
        strlcpy(&(x_struct_instance.my_name),"Fatty",sizeof(x_struct_instance.my_name));
    }
    some_function(&x_struct_instance);
  }
}

Update

To answer the question, which is more optimal, I would suggest method #1, but it is probably marginal and dependent on the compiler and other factors. My reasoning is that there isn't any allocation / deallocation going on, the data is on the stack and the function preamble created by the compiler will allocate a big enough stack frame for the function such that it doesn't need to resize it. In any case, allocating on the stack is just moving the stack pointer so it's not a big overhead.

Also, memset is a general purpose method for setting memory and might have extra logic in it that copes with edge conditions such as unaligned memory. The compiler can implement an initialiser more intelligently than a general purpose algorithm (at least, one would hope so).

Skizz
  • 69,698
  • 10
  • 71
  • 108
  • Sorry about not being clear in the question. the structure I have written is just an example structure. This might not work if I am having a buffer to store some binary data inside the structure. – CCoder Oct 01 '13 at 15:36
  • 1
    @CCoder: the basic idea is sound, don't write to memory (clear it) and then write to it again (set data). If you have a binary blob of indeterminate size which is stored in a fixed size buffer, either add a number of valid bytes field or pad the buffer with zeros (or whatever), don't clear the whole buffer and then write over the cleared data. – Skizz Oct 01 '13 at 15:40
  • Ok. That is fine. My question is more about WHICH method is more optimal and less about HOW to make this code more optimal. – CCoder Oct 01 '13 at 15:44
  • as @Skizz said. If you have a pointer to 10MB of data, you aren't going to memset that every time (security issues aside). You just reset the value that indicates how much data is in the buffer. With a char string this is achieved with writing a null terminator to [0]. Break exit case aside, both of these are going to be near the same on any modern compiler because the difference between your two code paths is where memset is done. The first one will memset at the start of the loop, and the second one will memset at the end. You might get a sub esp, xxx in case 2 each iteration. – djgandy Oct 01 '13 at 15:54