GCC inline assembly for SPARC: How to handle integer doubleword pairs?

Question

From what I understand, in SPARC, 32-bit integer quantities are stored in single registers and 64-bit integer quantities are stored in adjacent register pairs, with the even register containing the high 32 bits and the odd register containing the low 32 bits.

I need to write a few specialized SPARC inline assembly macros (inline assembly functions would be fine too) that deal with 64-bit integer doubleword pairs, and I can't figure out how to refer generically (using GCC extended inline assembly) to the two halves of the pair in my inline assembly. Though my assembly macros will be a little more complex than the MULTIPLY() macro shown below, the multiplication example, if it worked, would demonstrate how to deal with the two halves of a 64-bit doubleword pair. Can anyone tell me how to fix my MULTIPLY() macro?

In case it matters, I'm on a...

bash-2.03$ uname -a
SunOS [...] 5.8 Generic_117350-39 sun4u sparc SUNW,Ultra-80

Here is my trivial example program (in C):

#include <stdio.h>
//#include <stdint.h>
#define uint32 unsigned long int
#define uint64 unsigned long long int


#define MULTIPLY(r, a, b)  /* (r = a * b) */   \
   asm("umul %1, %2, %0;"  /* unsigned mul */  \
       : /* regs out */  "=h"(r)               \
       : /* regs in  */  "r"(a),   "r"(b));
#if 0
       : /* clobbers */  "%y" );
#endif


int main(int argc, char** argv)
{
   uint64 r;
   uint32 a=0xdeadbeef, b=0xc0deba5e;

   // loses the top 32 bits of the multiplication because the result is
   // truncated at 32 bits which then gets assigned to the 64-bit 'r'...
   r = a * b;
   printf("u64=u32*u32  ---->  r=a*b           "
          "---->  0x%016llx = 0x%x * 0x%x\n",
          r, a, b);

   // force promotion of 'a' to uint64 to get 64-bit multiplication
   // (could cast either a or b as uint64, which one doesn't matter,
   // as one explicit cast causes the other to be promoted as well)...
   r = ((uint64)a) * b;
   printf("u64=u64*u32  ---->  r=((u64)a)*b    "
          "---->  0x%016llx = 0x%x * 0x%x\n",
          r, a, b);

   MULTIPLY(r, a, b);
   printf("u64=u64*u32  ---->  MULTIPLY(r,a,b) "
          "---->  0x%016llx = 0x%x * 0x%x\n",
          r, a, b);

   return 0;
}

Which, when compiled with gcc-3.2-sun4u/bin/gcc -o mult -mcpu=ultrasparc mult.c, produces this output:

u64=u32*u32  ---->  r=a*b           ---->  0x00000000d3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  r=((u64)a)*b    ---->  0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  MULTIPLY(r,a,b) ---->  0xd3c7c1c2deadbeef = 0xdeadbeef * 0xc0deba5e

I looked at the -S -fverbose-asm output of gcc, and it's doing some strange shifting of the result register (which is even) & writing into the adjacent odd register. My problem is that I don't know how to generically refer to the adjacent odd register in the extended asm syntax. I thought perhaps the 'h' asm constraint in "=h"(r) might have something to do with it, but I can't find any examples of how to use it.

sun4u is Sparc v9, so has 64-bit registers unless you're running in 32-bit compatibility mode... — Chris Dodd, Mar 26 '12 at 20:45
@ChrisDodd - If that's true (and I believe you are correct), do you have any idea why my asm macro whouldn't work as-is? — phonetagger, Mar 26 '12 at 20:50
might it be as dumb as needing to use a different assembler instruction to get at 64bit multiply? I googled and found http://docs.oracle.com/cd/E19455-01/806-3774/6jctamgv2/index.html which says "MULX" is "Generic 64-bit multiply" - just a small thought — gbulmer, Mar 26 '12 at 21:16
@gbulmer - Hmmm. I hadn't seen that instruction before. Thanks. I tried it but unfortunately got the same results. Strange! And yes I ensured I deleted the old build products & did a clean build. But it's interesting to note that even though the UMUL instruction is deprecated in SPARC V9, that's what GCC is still using. — phonetagger, Mar 26 '12 at 21:54
@phonetagger - well you have me intrigued. My only other thoughts are if there is supposed to be a data-size indication, like 'umuld', this suggests that it might be http://docs.oracle.com/cd/E19455-01/806-3774/6jctamgtr/index.html but that documentation is not clear to me. — gbulmer, Mar 26 '12 at 22:16
@gbulmer - Thanks for looking into this. The problem is really related to GCC extended asm syntax, as it applies specifically to the sparc processor. Hopefully someone with sparc extended asm experience will read this question before it drifts off into the abyss of unanswered questions. — phonetagger, Mar 27 '12 at 00:32
If you do `unsigned char a = 13; unsigned char b = 11; ... unsigned char c = a * b;` and look at the assembler, or look at an objdump (is their one for SPARC?) does the assembler have a operand size 'extension', i.e. is the umul a umulb? — gbulmer, Mar 27 '12 at 00:42
There is no `umulb`. There are only 32 and 64 bit multiply, with the 64-bit one being V9- (well, V8+)-only. I've never used the `h` constraint, and you shouldn't here unless you're building specifically for V8+ (where you have 64 bit registers but can only use 32 bits in the `%i` and `%l` registers for window overflow reasons), but apparently it's putting the 32 bit result in the high half of the 64-bit `%o` or `%g` register. — torek, Mar 27 '12 at 07:36
Just because you have a 64-bit sparc doesn't mean you're running in 64-bit mode. If you have a 32-bit OS then you can only run in 32-bit mode. A 64-bit OS can run processes in either mode. — Chris Dodd, Mar 27 '12 at 20:54
@phonetagger: indeed, it's a longstanding, and seemingly still unfixed one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43350 — jørgensen, Mar 29 '12 at 15:40

score 0 · Answer 1 · answered Mar 27 '12 at 05:15

0

The umul instruction multiplies two 32-bit (unsigned int) values in the lower halves of two registers, and puts the lower half of the 64-bit result in the destination register. The upper half of the result is written to the Y register. The upper half of the destination register is cleared. So what you probably want in order to use it is something like:

#define MULTIPLY(u, r, a, b) /* (u,r = a * b) */     \
asm("umul %2, %3, %0;"   /* unsigned mul */          \
    "rd %%y, %1;"        /* get hi word of result */ \
    : /* regs out */  "=r"(r), "=r"(u)               \
    : /* regs in  */  "r" (a), "r" (b)               \
    : /* clobbers */  "%y" );

Note, however, that you're almost certainly better off just writing the multiply in C, using uint64_t or unsigned long long operands.

answered Mar 27 '12 at 05:15

Chris Dodd

119,907
13
134
226

That doesn't quite work for me because I want the result to magically show up in my uint64 variable 'r' without me having to combine 'u,r'. You mentioned above about v9 being 64-bit unless running in 32-bit compatibility mode. How do I tell if my system is running in 32-bit compatibility mode? (Pls see also my comments below torek's post.) – phonetagger Mar 27 '12 at 12:48
@phonetagger: I don't know about Solaris, but with BSD, `uname -m` will give you `sparc` for 32 bit mode and `sparc64` for 64 bit mode. – Chris Dodd Mar 27 '12 at 20:59
I'm beginning to not like Solaris. `uname -m` gives me `sun4u` (oh, a Sun for ME? uh, thanks...) and `uname -X` tells me `BusType = `. See also my recent comment under torek's post. Also BTW, the multiply is just a boiled-down version of what I hope to ultimately implement in the asm macro. If I can do the multiply (and transfer the full 64-bit result back to a uint64 for use in C), then I have the foundation for what I actually want to accomplish. Ultimately this is for the core of an enc alg which absolutely must be as fast as possible; right now it's so slow it's painful. – phonetagger Mar 28 '12 at 14:14
...and I was really hoping I wouldn't have to implement the entire thing in asm, just the guts of the computation right in the middle of the loop bodies. – phonetagger Mar 28 '12 at 14:18

torek · Answer 2 · 2012-03-29T20:41:02.450

0

I think you're getting the old umul instruction because you're using -mcpu= instead of -march=. Per the documentation, the latter has been changed to be synonymous with -mtune=: generate instructions for "most generic architecture" but optimize them for use on the given architecture. So -mcpu=ultrasparc means "generate for V8 sparc, but optimize for Ultrasparc". Using -march=ultrasparc should get you a raw 64-bit multiply.

Edit: based on all the discussion and other answers, it appears that gcc 3.2 as configured does not work with -m64, which forces one to run in "v8plus" mode on Solaris 2 (32-bit address space and, for the most part, 32-bit registers, except for value stored in the %g and %o registers). A sufficiently newer gcc should allow compiling with -m64, which will make the entire situation more or less moot. (And you can then add -march=niagara2 or whatever, as appropriate for your particular target hardware.) You may need to install a full set of binutils as well, per the following from the gcc 4.7.0 config/sparc/sparc.h:

#if TARGET_CPU_DEFAULT == TARGET_CPU_v9
/* ??? What does Sun's CC pass?  */
#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
/* ??? It's not clear how other assemblers will handle this, so by default
   use GAS.  Sun's Solaris assembler recognizes -xarch=v8plus, but this case
   is handled in sol2.h.  */
#define ASM_CPU64_DEFAULT_SPEC "-Av9"
#endif
#if TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc
#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
#define ASM_CPU64_DEFAULT_SPEC "-Av9a"
#endif
...

With all that in place you should just be able to multiply two 64-bit values to get a 64-bit result, in ordinary C code, without resorting to inline assembly.

(Otherwise you'll need something like the code you eventually came up with for gcc 3.2.)

edited Mar 29 '12 at 20:41

answered Mar 27 '12 at 07:41

torek

448,244
59
642
775

I tried `gcc-3.2-sun4u/bin/gcc -S -fverbose-asm -march=ultrasparc mult.c` and it complains with `cc1: invalid option 'arch=ultrasparc'`. But that did give me the idea to try other target options, which I've tried in various combinations and based on some of the errors I'm getting I'm wondering if my system is configured to run in 32-bit mode even though it's 64-bit hardware. Any idea how to tell if that's the case? – phonetagger Mar 27 '12 at 12:53
(Hmm. Might be `-march=v9`. Been a long time since I used these.) On Solaris, there isn't exactly a "32 bit mode" per se. Instead, whether registers save 64 bits is determined by the value in the `%sp` register. If it's even (and then must be congruent to 0 mod 8), only the lower 32 bits of the `%i` and `%l` registers are saved in the "register window save area" at `[%sp+0]` through `[%sp+63]`. If it's odd, there's a "bias" and all 64 bits are saved. See http://docs.oracle.com/cd/E19253-01/816-5138/advanced-2/index.html for details. – torek Mar 27 '12 at 17:05
Normally to specify 32-bit or 64-bit mode, you use `-m32` or `-m64`. Depending on how you have gcc configured it might only support one or the other. – Chris Dodd Mar 27 '12 at 20:51
With `-m32` my trivial compilation completes successfully, but with `-m64` it dies with an internal compiler error `Internal compiler error in instantiate_virtual_regs_1, at function.c:3972 Please submit a full bug report, with....` So I'm guessing -m32 must be the default on my system! BTW, torek, thanks for the v9 ABI link. – phonetagger Mar 28 '12 at 14:15
@ChrisDodd - (and @torek) - I found some discussion about hi & lo reg pairs [here](https://forums.oracle.com/forums/thread.jspa?threadID=1997296), and manged to get my macro working perfectly. Thank you guys so much for your help! I'm new to SO... Both of you have been helpful but neither of your answers solved my problem... what should I do? Should I edit my original question post to include the solution? – phonetagger Mar 28 '12 at 15:53
Internal compiler errors are such fun. :-) As far as I know (which is not all that far, I try to avoid using Solaris :-) ) the OS always supports 64-bit stuff. Your best bet at this point might be to move to gcc 4.x. As for stackoverflow itself, I'm also pretty new (about a week), but I'd suggest you add your own final solution to your question, and not check-off either answer.... – torek Mar 28 '12 at 19:27

score 0 · Accepted Answer · answered Mar 29 '12 at 14:52

First of all, thanks very much to Chris Dodd, torek, and gbulmer for your efforts & help. I managed to figure out how to do this with some comments I found here, reproduced in part (and slightly edited for form but not content) below:

Thread: RFE: "h" and "U" asm constraints and "H" and "L" modifiers.
[...]the following two contraints (quoted from gcc.info) for some v8+ ABI inline asm:
'h' 64-bit global or out register for the SPARC-V8+ architecture.
'U' Even register
The "U" is needed to allocate register(s) for ldd/std (it allocates an even+odd pair for a uint64_t). For instance:
    void atomic64_set(volatile uint64_t *p, uint64_t v) {
        asm volatile ( "std %1, %0" : "=m"(*p) : "U"(v) );
    }
With or without "U" as a constraint, one can use "H" and "L" as modifiers in the template to get the High and Low registers of the pair used for a 64-bit value. The "h" constraint allocates a register of which, according to the v8+ ABI, one may safely use all 64bits (Global or Output regs only). The following (artificial) example demonstrates the "h" constraint and the "H" and "L" modifiers:
    void ex_store64(uint64_t *p, uint64_t v) {  
       register int tmp; // Don't say uint64_t or GCC thinks we want 2 regs  
       asm volatile (  
          "sllx %H2,32,%1 \n\t" // tmp = HI32(v) &lt&lt 32  
          "or %1,%L2,%1 \n\t" // tmp |= LO32(v)  
          "stx %0, %1" // store 64-bit tmp  
          :  "=m"(*p),  "=&h"(tmp)  :  "r"(v));  
      }
Disclaimer: these examples were written on the spot and may not be correct with respect to early-clobber and similar issues.
-Paul

Based on that, I was able to figure out how to rewrite my own 'MULTIPLY' macro from my problem statement:

#define MULTIPLY(r, a, b)     /* r = a * b          */\
   asm("umul %1, %2, %L0;"    /* umul a,b,r         */\
       "srlx %L0, 32, %H0;"                           \
       : /* regs out */   "=r"(r)                     \
       : /* regs in  */   "r"(a),   "r"(b));
       /* re: clobbbers "none": I tried specifying :"%y"
        *     in various ways but GCC kept telling me
        *     there was no y, %y, or %%y register. */

My results are now:

u64=u32*u32  ---->  r=a*b           ---->  0x00000000d3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  r=((u64)a)*b    ---->  0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e  
u64=u64*u32  ---->  MULTIPLY(r,a,b) ---->  0xa7c40bfad3c7c1c2 = 0xdeadbeef * 0xc0deba5e

Interesting ... that it works at all implies that the `umul` assembly opcode is getting you the `umulx` instruction. I grabbed a copy of gcc-3.2 and looked in `gcc/config/sparc`; there's a sol2.h that passes -xarch= to the assembler, which implies (assuming you're on solaris2) that gcc is using Sun's assembler. Poking around in `config/sparc/sparc.h` implies that this makes `umul` actually invoke `umulx`. I believe you will need to add the `h` constraint after all, and also that gcc 3.2 is full of bugs. :-) — torek, Mar 29 '12 at 20:08
Also: the reason you can't put `%y` in the clobber section is that gcc mostly ignores `%y`, except for immediately reading values out of it after emitting 32-bit versions of `umul` and `smul`. This appears to be true all the way from gcc 3.2 to 4.7.0. — torek, Mar 29 '12 at 20:44
@phonetagger - I'm pleased you got what you needed, but even more pleased you shared!-) Thank you. — gbulmer, Mar 29 '12 at 20:48

GCC inline assembly for SPARC: How to handle integer doubleword pairs?

3 Answers3