tl;dr: Consider the larger picture first before applying such supposed-micro-optimizations.
Looking at Robert's example code, my first thought was
++*( (a==b) ? &x : &y);
However I was on my mobile phone and could not check the disassembly of this myself.
Robert was kind enough to insert it into his test kernel and posted the SASS diff of this idea vs. the original if/else code posted in the question:
$ cuobjdump -sass t1513.o >out3.sass
$ diff out1.sass out3.sass
13,44c13,52
< /* 0x2230427042004307 */
< /*0008*/ MOV R1, c[0x0][0x44]; /* 0x2800400110005de4 */
< /*0010*/ MOV R4, c[0x0][0x150]; /* 0x2800400540011de4 */
< /*0018*/ MOV R5, c[0x0][0x154]; /* 0x2800400550015de4 */
< /*0020*/ MOV R2, c[0x0][0x148]; /* 0x2800400520009de4 */
< /*0028*/ MOV R3, c[0x0][0x14c]; /* 0x280040053000dde4 */
< /*0030*/ LD.E R6, [R4]; /* 0x8400000000419c85 */
< /*0038*/ LDC.U8 R7, c[0x0][0x141]; /* 0x1400000507f1dc06 */
< /* 0x2272028042824047 */
< /*0048*/ LD.E R0, [R2]; /* 0x8400000000201c85 */
< /*0050*/ LDC.U8 R8, c[0x0][0x140]; /* 0x1400000503f21c06 */
< /*0058*/ I2I.S16.S8 R7, R7; /* 0x1c0000001c11de84 */
< /*0060*/ I2I.S16.S8 R8, R8; /* 0x1c00000020121e84 */
< /*0068*/ LOP32I.AND R7, R7, 0xff; /* 0x38000003fc71dc02 */
< /*0070*/ LOP32I.AND R8, R8, 0xff; /* 0x38000003fc821c02 */
< /*0078*/ BFE R7, R7, 0x1000; /* 0x7000c0400071dc23 */
< /* 0x22e04283f2828287 */
< /*0088*/ BFE R8, R8, 0x1000; /* 0x7000c04000821c23 */
< /*0090*/ ISET.EQ.AND R7, R8, R7, PT; /* 0x110e00001c81dc23 */
< /*0098*/ LOP32I.AND R7, R7, 0x1; /* 0x380000000471dc02 */
< /*00a0*/ LOP32I.XOR R8, R7, 0x1; /* 0x3800000004721c82 */
< /*00a8*/ IADD R8, R6, R8; /* 0x4800000020621c03 */
< /*00b0*/ IADD R7, R0, R7; /* 0x480000001c01dc03 */
< /*00b8*/ ST.E [R4], R8; /* 0x9400000000421c85 */
< /* 0x200000000002f047 */
< /*00c8*/ ST.E [R2], R7; /* 0x940000000021dc85 */
< /*00d0*/ EXIT; /* 0x8000000000001de7 */
< /*00d8*/ BRA 0xd8; /* 0x4003ffffe0001de7 */
< /*00e0*/ NOP; /* 0x4000000000001de4 */
< /*00e8*/ NOP; /* 0x4000000000001de4 */
< /*00f0*/ NOP; /* 0x4000000000001de4 */
< /*00f8*/ NOP; /* 0x4000000000001de4 */
---
> /* 0x2270420042304307 */
> /*0008*/ MOV R1, c[0x0][0x44]; /* 0x2800400110005de4 */
> /*0010*/ MOV R10, c[0x0][0x148]; /* 0x2800400520029de4 */
> /*0018*/ IADD32I R1, R1, -0x8; /* 0x0bffffffe0105c02 */
> /*0020*/ MOV R11, c[0x0][0x14c]; /* 0x280040053002dde4 */
> /*0028*/ LDC.U8 R0, c[0x0][0x141]; /* 0x1400000507f01c06 */
> /*0030*/ MOV R8, c[0x0][0x150]; /* 0x2800400540021de4 */
> /*0038*/ MOV R9, c[0x0][0x154]; /* 0x2800400550025de4 */
> /* 0x2232423240423047 */
> /*0048*/ LD.E R4, [R10]; /* 0x8400000000a11c85 */
> /*0050*/ I2I.S16.S8 R0, R0; /* 0x1c00000000101e84 */
> /*0058*/ LD.E R5, [R8]; /* 0x8400000000815c85 */
> /*0060*/ LDC.U8 R2, c[0x0][0x140]; /* 0x1400000503f09c06 */
> /*0068*/ LOP32I.AND R0, R0, 0xff; /* 0x38000003fc001c02 */
> /*0070*/ I2I.S16.S8 R2, R2; /* 0x1c00000008109e84 */
> /*0078*/ BFE R0, R0, 0x1000; /* 0x7000c04000001c23 */
> /* 0x2283f282b2028287 */
> /*0088*/ LOP32I.AND R2, R2, 0xff; /* 0x38000003fc209c02 */
> /*0090*/ BFE R3, R2, 0x1000; /* 0x7000c0400020dc23 */
> /*0098*/ ISETP.NE.AND P0, PT, R3, R0, PT; /* 0x1a8e00000031dc23 */
> /*00a0*/ LOP.OR R3, R1, c[0x0][0x24]; /* 0x680040009010dc43 */
> /*00a8*/ @P0 IADD32I R3, R3, 0x4; /* 0x080000001030c002 */
> /*00b0*/ LOP32I.AND R3, R3, 0xffffff; /* 0x3803fffffc30dc02 */
> /*00b8*/ SEL R0, R4, R5, !P0; /* 0x2010000014401c04 */
> /* 0x22f042e3f2e28047 */
> /*00c8*/ STL.64 [R1], R4; /* 0xc800000000111ca5 */
> /*00d0*/ IADD32I R0, R0, 0x1; /* 0x0800000004001c02 */
> /*00d8*/ STL [R3], R0; /* 0xc800000000301c85 */
> /*00e0*/ LDL.64 R6, [R1]; /* 0xc000000000119ca5 */
> /*00e8*/ ST.E [R8], R7; /* 0x940000000081dc85 */
> /*00f0*/ ST.E [R10], R6; /* 0x9400000000a19c85 */
> /*00f8*/ EXIT; /* 0x8000000000001de7 */
> /*0100*/ BRA 0x100; /* 0x4003ffffe0001de7 */
> /*0108*/ NOP; /* 0x4000000000001de4 */
> /*0110*/ NOP; /* 0x4000000000001de4 */
> /*0118*/ NOP; /* 0x4000000000001de4 */
> /*0120*/ NOP; /* 0x4000000000001de4 */
> /*0128*/ NOP; /* 0x4000000000001de4 */
> /*0130*/ NOP; /* 0x4000000000001de4 */
> /*0138*/ NOP; /* 0x4000000000001de4 */
$
Robert concluded that the compiler chose to use predication in this case.
The disassembly seemed to make no sense to me, until I realised that Robert inserted my one-liner in a different way than I expected. In trying to stay close to the (most likely accurately) presumed intentions of the questioner, he dereferenced the pointers into automatic variables, then inserted my one-liner (which really makes little sense in that case because taking the address of automatic variables forces them into local memory), and wrote the content of the the automatic variables back to global memory.
My thought however was to just replace the entire body of the test case with my ++*( (a==b) ? dx : dy);
one-liner, which would have led to better looking assembly:
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x2800400110005de4 */
/*0010*/ LDC.U8 R0, c[0x0][0x141]; /* 0x1400000507f01c06 */
/*0018*/ LDC.U8 R2, c[0x0][0x140]; /* 0x1400000503f09c06 */
/*0020*/ I2I.S16.S8 R0, R0; /* 0x1c00000000101e84 */
/*0028*/ I2I.S16.S8 R2, R2; /* 0x1c00000008109e84 */
/*0030*/ LOP32I.AND R0, R0, 0xff; /* 0x38000003fc001c02 */
/*0038*/ LOP32I.AND R2, R2, 0xff; /* 0x38000003fc209c02 */
/* 0x228202c042804237 */
/*0048*/ BFE R0, R0, 0x1000; /* 0x7000c04000001c23 */
/*0050*/ BFE R3, R2, 0x1000; /* 0x7000c0400020dc23 */
/*0058*/ MOV R2, c[0x0][0x148]; /* 0x2800400520009de4 */
/*0060*/ ISETP.NE.AND P0, PT, R3, R0, PT; /* 0x1a8e00000031dc23 */
/*0068*/ MOV R0, c[0x0][0x14c]; /* 0x2800400530001de4 */
/*0070*/ SEL R2, R2, c[0x0][0x150], !P0; /* 0x2010400540209c04 */
/*0078*/ SEL R3, R0, c[0x0][0x154], !P0; /* 0x201040055000dc04 */
/* 0x20000002f04283f7 */
/*0088*/ LD.E R0, [R2]; /* 0x8400000000201c85 */
/*0090*/ IADD32I R4, R0, 0x1; /* 0x0800000004011c02 */
/*0098*/ ST.E [R2], R4; /* 0x9400000000211c85 */
/*00a0*/ EXIT; /* 0x8000000000001de7 */
/*00a8*/ BRA 0xa8; /* 0x4003ffffe0001de7 */
/*00b0*/ NOP; /* 0x4000000000001de4 */
/*00b8*/ NOP; /* 0x4000000000001de4 */
This code looks better to me than Robert's testcase (by itself). But it probably is of no use to vallismortis, because in his case the variables will not be in addressable memory.
Of course, Robert's other comment about premature optimisation also applies here, even if this should actually result in faster code.