2

I would like to send data from an STM32 (Cortex M4) device via its I2S peripheral using DMA in 24bit mode, MSB first. The I2S data register is however only 16bit, according to the datasheet you have to send the upper halfword first, then the lower one. This is problematic when using the DMA, it will always send the lower halfword first.

What is an efficient way in c or ARM-assembler to switch the two halfwords?

Tobi
  • 95
  • 6
  • 1
    What is your current inefficient way? – Jongware Jan 31 '18 at 12:14
  • 1
    `w = (w<<8)|(w>>8)`, and trust your compiler. – iBug Jan 31 '18 at 12:14
  • Can you preprocess your data with REV / REV16 instructions? Are you sure your DMA doesn't support endianness swapping? – user694733 Jan 31 '18 at 12:27
  • 1
    You probably meant to say : w = (w<<16)|(w>>16) ? – Tobi Jan 31 '18 at 12:27
  • IMO, "word" = 16 bits, so "halfword" = byte (32 bits is "dword"). Am I wrong? – iBug Jan 31 '18 at 12:34
  • 1
    @iBug You are right when using Intel terms. However, OP is using RISC terms where a word is 32 bits and a halfword is 16 bits. – fuz Jan 31 '18 at 12:36
  • word = 32 bits for arm, mips, etc. this was tagged as arm not x86. – old_timer Jan 31 '18 at 13:03
  • isn't `word` on ARM signed? Then C `>>` is UB, so depending on your compiler you may want to mask the result of right shift first, before merging with left shift result: `w = (w<<16)|((w>>16)0xFFFF);` just to keep the source portable across platforms and compilers (although to keep it easily portable I would rather use `uint32_t` type instead of `word`, which makes the right shift well defined, then you don't need to mask it. (now I checked fuz answer, all is there, perfect) – Ped7g Jan 31 '18 at 13:50

1 Answers1

8

Write the common idiom

unsigned w;
w = w << 16 | w >> 16;

an optimizing compiler generally translates this to a single ror or rev16 instruction. It is reasonable to expect the compiler to do this optimization. Make sure that w is unsigned so the right shift is an unsigned one.

If the compiler does not optimize this, it's still just two instructions (a shift and an or with a shifted operand), so not much performance is lost.

fuz
  • 88,405
  • 25
  • 200
  • 352