0

When I have a memory buffer in OpenCL I know that I need to align the data in it to a 16 byte boundary.

But what do I do if my whole data (not the data structure, the actual data collection) is, for example, only 15 bytes big?
Should I put it into a 16 byte big buffer or a 15 byte one?
Is there a performance difference?

Tara
  • 1,673
  • 22
  • 30
  • So, you are running a kernel with just 15 bytes input? Thats a bad example. If you are running a X*16+Y bytes that is not a multiple of 16, then is more plausible. – DarkZeros Dec 18 '13 at 10:13
  • Yes, the data is bigger than 15 bytes. But not a multiple of 16. – Tara Dec 18 '13 at 12:15

1 Answers1

0

If you are going to use many of the 15-byte data structure and you plan on using local memory, I suggest keeping the 15-byte struct and loading a multiple of 16 of them at a time. I think 'many' would be at least several kb worth of data in your work group. The reason for this is because when you sacrifice the extra byte for every 15, you add 6% more transfer overhead. Leaving the size as 15 could also help avoid bank conflicts when writing data back to memory (both local and global memory).

More info about bank conflicts.

Community
  • 1
  • 1
mfa
  • 5,017
  • 2
  • 23
  • 28
  • 1
    An array of 15-byte structures will probably not be aligned and will have costly access times. I recommend keeping it at 16 (especially since the natural alignment of most devices, including GPU's, is 16 bytes). Of course it's a performance/memory trade-off. I also fail to see how the size of a structure has anything to do with bank conflicts, if anything it makes them worse due to unaligned load/stores. – Thomas Mar 07 '13 at 02:53
  • Let me clarify: In my example there are no 15-byte data structures (I never said anything about structures). The WHOLE data is 15-bytes big. It doesn't have to be necessarily just 15 bytes big. It could be any number which is not a multiple of 16. – Tara Mar 07 '13 at 07:35