-3

what's a fast way of assigning a double to 8 bytes inside a byte array?

I have a byte array that is about 4k bytes big and I am attempting to take 8 bytes out of that and copy it into a double. I am trying to avoid memmove and memcpy for speed reasons, as assigning variables is much faster. I am working in embedded world, any other fast implementations are appreciated.

void foo(double *pdest)
{
   // Try 1: I am using 1 element in the array, it won't work
   *pdest = (double)p->stk[stkpos];

   // Try 2: I am attempting to loose the single element element
   *pdest = (double)((double*)&p->stk[stkpos]);
}

Neither solutions have worked for me, I am not sure how I can achieve this.

Matt
  • 17
  • 4
  • 8
    Are you *sure* that assigning is faster that `memcpy`? – dbush Aug 16 '18 at 13:11
  • I looked at the assembly inside memcpy and it looks quite big. I would like to see the assembly inside this approach and compare. – Matt Aug 16 '18 at 13:12
  • 3
    Are you aware of problems with *strict aliasing violation* and *invalid alignment* when attempting pointer dereference through wrong type? – user694733 Aug 16 '18 at 13:22
  • mem* family functions are faster compared to assigning to the variable, is it ARM assembly you checked?. simple and straight forward answer is to use memcopy or memmove. – danglingpointer Aug 16 '18 at 13:23
  • 1
    Number one optimization here is otherwise to get rid of `double`. What is your target? Does it even have a FPU? – Lundin Aug 16 '18 at 13:23
  • 5
    The only *standard-conforming* ways to do this involve copying the individual bytes of the representation of the `double`. `memcpy` is one of them. A single assignment such as you propose is not -- the standard explicitly declares it undefined. If it happens to work reliably in your particular implementation then you might not care about the undefinedness. In that case, however, whatever behavior you see is implementation-specific. – John Bollinger Aug 16 '18 at 13:24
  • 2
    Some (most?) compilers can optimize `memcpy`, and substitute it with optimized version or even replace it with simple assignment, if conditions are right. If it cannot do the substitution, then it's possible simple assignment would not work either. – user694733 Aug 16 '18 at 13:26
  • 1
    The only difference is that memcpy comes with internal tricks, so that it doesn't cause misaligned access. – Lundin Aug 16 '18 at 13:28
  • This is ARM assembly. Memmove is WAY too slow for me, I'm trying to optimize microseconds. – Matt Aug 16 '18 at 13:29
  • These are doubles because they're analog channels they have to be double. – Matt Aug 16 '18 at 13:30
  • 2
    @Matt "they're analog channels they have to be double" is a common beginner misunderstanding. ADCs return an integer of 8, 10 or 12 bits. You can do fixed point arithmetic just fine even if the end result should have a decimal comma somewhere. So, is it a Cortex M4 or bigger? That is, does it even have a FPU? Otherwise the use of memcpy is the least of your performance problems. – Lundin Aug 16 '18 at 13:34
  • Related see [Safely punning char* to double in C](https://stackoverflow.com/q/222266/608639), [What's a proper way of type-punning a float to an int and vice-versa?](https://stackoverflow.com/q/17789928/608639) and friends. – jww Aug 16 '18 at 13:40
  • What @LIndin says; but note that the FPU on a Cortex-M4 only supports single precision, so there is still an overhead. – Clifford Aug 16 '18 at 17:57

3 Answers3

6

If your compiler isn't horribly broken, copy through assignment operator should be more or less identical to memcpy. So what you are trying is nothing but "pre-mature optimization".

You can't write code like *pdest = (double)p->stk[stkpos]; because that invokes undefined behavior. See What is the strict aliasing rule? memcpy however, doesn't come with that bug.

Solution: use memcpy.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Mmm, alright. I will keep it to memcpy. Thanks! The code just looks pretty big. – Matt Aug 16 '18 at 13:28
  • 1
    The compiler doesn't have to be horribly broken for copying via the assignment operator to fail. Such an assignment might fail in a perfectly reasonable implementation if, for example, it results in a misaligned memory access. – John Bollinger Aug 16 '18 at 13:29
  • 1
    @Matt What looks "pretty big?" The disassembled library code, the disassembled binary executable or the actual executed instructions? – Lundin Aug 16 '18 at 13:30
  • @JohnBollinger Hence "more or less" :) The alignment safety of memcpy is indeed the difference, as I actually mentioned in comments to the question. – Lundin Aug 16 '18 at 13:31
  • @Matt large code is not necessarily slow. – Jabberwocky Aug 16 '18 at 13:31
  • The actual executed instructions. What I am doing is a pop instruction, I was able to create the push instruction with assignments and not memcpy. My IDE has a clock cycle counter and according to that when I cycle through my version of memcpy (assigning variables) it takes around 44 clock cycles. A memcpy takes about 80 clock cycles. – Matt Aug 16 '18 at 13:33
  • @Matt What compiler and optimizer settings? – Lundin Aug 16 '18 at 13:35
  • GCC and I don't see any optimizing settings. Using CrossStudio. – Matt Aug 16 '18 at 13:36
  • @Matt Ok nice, that one I know. In the project window, right click your project, pick "edit properties". Under "code generation", find "optimization level". Pick "Level 3", equivalent to `gcc -O3`. Re-build and check the disassembly again. – Lundin Aug 16 '18 at 13:39
  • It just pushed my function inline but same clock cycles count and assembly looks about the same. I didn't memorize it before. – Matt Aug 16 '18 at 13:44
  • @Matt Then maybe the problem is elsewhere. For example, the compiler could be invoking a whole software floating point library. Which MCU part is it? – Lundin Aug 16 '18 at 13:45
  • @Matt You don't know which MCU you are programming? How is that even possible? – Lundin Aug 16 '18 at 14:16
5

Assuming p->strk is an array of bytes:

This doesn't work:

*pdest = (double)p->stk[stkpos];

Because you're only reading a single byte and assigning that value to a double.

This doesn't work:

*pdest = (double)((double*)&p->stk[stkpos]);

Because of the outer cast. The first part, casing &p->stk[stkpos] to a double * will in fact give you a pointer to a double that starts at the address of &p->stk[stkpos], but then you cast the value of that pointer, i.e. a memory address (not what it points to) to a double which doesn't make sense.

What you were probably trying to do what this:

*pdest = *((double*)&p->stk[stkpos]);

This takes the address of p->stk[stkpos], treats it as a pointer to a double, and dereferences the address to read a double.

There's still a problem with this, however. Such a conversion violates strict aliasing. And even if strict aliasing was disabled, you might end up with a misaligned memory access.

The proper way to do this is to use memcpy. That is guaranteed to work as expected.

dbush
  • 205,898
  • 23
  • 218
  • 273
-3

I think you're casting is off. You need to indicate that you are referencing a pointer to a double, then dereference that.

*pdest = *((double *)(&p->stk[stkpos]));

should work

kberson
  • 25
  • 3