Let's start with this:
I have a block of memory of 16 bytes and I need to copy only even bytes to a 8 bytes block of memory.
My current algorithm is doing something like this:
unsigned int source_size = 16, destination_size = 8, i;
unsigned char * source = new unsigned char[source_size];
unsigned char * destination = new unsigned char[destination_size];
// fill source
for( i = 0; i < source_size; ++i)
{
source[i] = 0xf + i;
}
// source :
// 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e
// copy
for( i = 0; i < destination_size; ++i)
{
destination[i] = source[i * 2];
}
// destination :
// 0f 11 13 15 17 19 1b 1d
It's just an example, because I would like to know if there's a better method to do this when I need to get every 3rd byte or every 4th byte, not just even bytes.
I know using loop I can achieve this but I need to optmize this... I don't exactly know how to use SSE so I dont't know if it's possible to use in this case, but something like memcpy magic kinda thing would be great.
I also thought about using a macro to get rid of the loop since the size of the source and the destination are both constant, but that doesn't look like a big deal.
Maybe you can think out of the box if I say that this is to extract YCbCr bytes of a YUYV pixel format. Also I need to emphasize that I'm doing this to get rid of the libswscale.