-1

I am trying to write inline assembly with my C code to perform compare and swap operation. My code is:

typedef struct node {
    int data;
    struct node * next;
    struct node * backlink;
    int flag;
    int mark;
} node_lf;

typedef struct searchfrom {
    node_lf * current;
    node_lf * next;
} return_sf;

typedef struct csArg {
    node_lf * node;
    int mark;
    int flag;
} cs_arg;

typedef struct return_tryFlag {
    node_lf * node;
    int result;
} return_tf;

static inline node_lf cs(node_lf * address, cs_arg *old_val, cs_arg *new_val)
{
    node_lf value = *address;
    __asm__ __volatile__("lock; cmpxchg16b %0; setz %1;"
                         :"=m"(*(volatile node_lf *)address),
                          "=q"(value)
                         :"m"(*(volatile node_lf *)address),
                          "a"(old_val->mark), "d"(old_val->flag),
                          "b"(new_val->mark), "c"(new_val->flag)
                         :"memory");
    return value;
}

GCC gives this error when the code is compiled:

linkedlist.c: In function 'cs': linkedlist.c:45:3: error: impossible constraint in 'asm' __asm__ __volatile__("lock; cmpxchg16b %0; setz %1;":"=m"(*(volatile node_lf

What is wrong in my code? And how can I fix this?

I am trying to implement the equivalent of this code:

node_lf cs (node_lf * address, cs_arg *old_val, cs_arg *new_val ) { 
    node_lf value = *address; 
    if (value.next == old_val->node && value.mark == old_val->mark && 
        value.flag == old_val->flag) { 
        address->next = new_val->node; 
        address->mark = new_val->mark; 
        address->flag = new_val->flag; 
    } 
    return value; 
}
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Ritesh
  • 67
  • 8
  • 2
    The code and the error message should be in the question itself. – user3386109 Jun 10 '16 at 02:56
  • You should paste the code in here instead of posting screenshots. – Forseth11 Jun 10 '16 at 02:57
  • 1
    The specific problem is that the 'q' constraint on x86 must be a/b/c/dx, but you are already using all of those for other parameters. That said, I'd like to make a push for NOT using inline asm. How about using __sync_bool_compare_and_swap? Or maybe even std::atomic? – David Wohlferd Jun 10 '16 at 03:01
  • 1
    I think what might really be going on is that `node_lf value` defines value as a data structure (not a pointer to one) and can't fit in a 64-bit register. Did you really mean to pass `value` in that way? – Michael Petch Jun 10 '16 at 18:34
  • @DavidWohlferd , I'm not sure that running out of registers would be the problem. Sure a/b/c/d were used as input operands, the template itself doesn't modify those registers, but GCC should be able to potentially reuse registers that may have been used as an input operand for the output operand. It would be up to the compiler to make the choice and keep them straight. That parameter isn't an early clobber. As well, since `cmpxchg16b` is available in 64-bit mode `=q` would include all integer registers (not just the legacy ones) – Michael Petch Jun 10 '16 at 18:44
  • @MichaelPetch: Thanks for OCRing the OP's images of text. Voted to reopen since it's a proper question now. Unfortunately there's at least one `1` vs. `l` error, in the error message: `cmpxchgl6b`. I think you're right that `=q` should be able to match any integer reg, since this code can only run in 64bit mode. I [tried it on godbolt](https://godbolt.org/g/hfOiKt), where `"=r"` doesn't help. **The problem is that `value` is a struct type that's too large to fit in a single register!** – Peter Cordes Jun 10 '16 at 20:05
  • 1
    @PeterCordes : Yes, it was OCR'ed, had better things to do than doing it by hand ;-). I had to fix numerous things and I missed one of the lowercase `L` in the error message. However I believe the code itself was reasonably accurate. I did point out the issue with `value` in one of my comments, asking if the OP really intended to do what they were doing (clearly they shouldn't be). – Michael Petch Jun 10 '16 at 20:08
  • The only thing I didn't do to the question was remove the `static` from the inline function `cs` to make the error more observable. I really considered it, and was torn as to whether someone might think it takes away from authors intent (I don't think so). I assume this code was in a header file and the error occurs when they attempt to use the function. – Michael Petch Jun 10 '16 at 20:12
  • 1
    @MichaelPetch - Yes, you're right: the problem is the fact that he is returning the struct by value. Well, that combined with the fact that he is trying to set the value of the struct with `setz`? `value` will contain a bool indicating whether the cmpxchg16b succeeded. – David Wohlferd Jun 10 '16 at 20:15
  • @DavidWohlferd : I concur. I just wasn't sure what his intent was. Almost wondered if he was trying to set the `int data` member of `value` to the `bool` value returned from `cmpxchg16b`? – Michael Petch Jun 10 '16 at 20:20
  • @MichaelPetch: I've been staring at the code trying to imagine the original intent here (since OP has apparently disappeared). The more I read it, the weirder it gets. cmpxchg16b works with RDX:RAX. But the values that are being used for rdx and rax are 'mark' and 'flags' which are both (4 byte) ints. If we were trying to swap out an entire csArg (a plausible thing to do), that would be 16 bytes, but the "m" we are working with is a node_lf, not a csArg. So we are overwriting the 'mark' and 'flag' values in a struct that doesn't have them? Without knowing OP intent, this cannot be fixed. – David Wohlferd Jun 10 '16 at 20:49
  • @DavidWohlferd I agree. It is easy to answer the obvious question as to why there is a constraint problem. The followup question that is bound to come will be why the code doesn't do what they probably wanted. And to answer that would probably require understanding what they were attempting to do which isn't entirely clear. – Michael Petch Jun 10 '16 at 20:53
  • Thank you for all of the comments. They are really helpful. My intent here is to check for next pointer, flag and mark with csArg node, mark and flag ints. The equivalent code for what I am trying to do is `node_lf cs (node_lf * address, cs_arg *old_val, cs_arg *new_val ) { node_lf value = *address; if (value.next == old_val->node && value.mark == old_val->mark && value.flag == old_val->flag) { address->next = new_val->node; address->mark = new_val->mark; address->flag = new_val->flag; } return value; }` – Ritesh Jun 10 '16 at 21:15
  • 1
    Then why not just use that c code? Is there some multi-threading issue here (as implied by `lock cmpxchg16b`)? What do you expect to have happen if there is conflict? There doesn't seem to be any provision for reporting the problem or retrying. Also, this code makes a copy of the node at `address`, (potentially) modifies the copy and then returns the copy. `address` is not modified. Is that the intent? – David Wohlferd Jun 10 '16 at 22:34
  • @David Wohlferd I am implementing a lock free linked list. So I need the compare and swap to happen atomically. Hence I cannot use the C code directly. What I want to do is return the value pointed by `address` from the compare and swap function and if the `old_val->node` is same as `value.next`, then `address->next` should be `new_val->node`. I hope this makes things a bit clear. – Ritesh Jun 11 '16 at 01:34
  • This feels like a horrible idea (or perhaps a homework assignment). That said... Let me try rephrasing what I think you said: You want to compare the 16 bytes pointed to by `old_val` with 16 of the bytes pointed to by `address`, and if they are the same, you want to copy the 16 bytes in `new_val` into `address`. Or at least that would be true if the order of fields in node_lf were changed to match the order in csArg. Can you picture how this would work if you were using memcmp and memcpy instead of comparing/copying individual fields? – David Wohlferd Jun 11 '16 at 03:44
  • Just to finish this off: Once you have the fields properly ordered, you want your memory parameter to be a pointer to the part of `address` to be checked/updated. Then you want the lower 8 bytes of the old values (the ones to compare) in rax ("a") and the upper 8 bytes in rdx. And the lower 8 bytes of the new values in rbx and the upper 8 bytes in rcx. Note these are not pointers to the data, but the actual data. Lastly, change `value` to be an `unsigned char`. After the asm, this will be a boolean value indicating whether the update was performed, or false if the 'old' values didn't match. – David Wohlferd Jun 11 '16 at 21:11

1 Answers1

1

So, let's take a crack at this.

A few points before we get started:

  1. Using inline asm is a bad idea. It is hard to write, it is hard to write correctly, it is hard to maintain, it isn't portable to other compilers or platforms, etc. Unless this is an assignment requirement, don't do it.
  2. When performing cmpxchg operations, the fields to be compared/exchanged must be contiguous. So if you want to operate on next, flag and mark in a single operation, they must be next to each other in the structure.
  3. When performing cmpxchg operations, the fields must be aligned on an appropriately sized boundary. For example if you are planning to operate on 16bytes, the data must be aligned on a 16byte boundary. gcc provides a variety of ways to do this from the aligned attribute, to _mm_malloc.
  4. When using __sync_bool_compare_and_swap (a better choice than inline asm), you must cast the data types to an appropriately-sized integer.
  5. I'm assuming your platform is x64.

2 & 3 required some changes to the field order of your structures. Note that I did not try to change searchfrom or return_tryFlag, since I'm not sure what they are used for.

So, with those things in mind, here's what I came up with:

#include <stdio.h>
#include <memory.h>

typedef struct node {
    struct node * next;
    int mark;
    int flag;

    struct node * backlink;
    int data;
} node_lf;

typedef struct csArg {
    node_lf * node;
    int mark;
    int flag;
} cs_arg;

bool cs3(node_lf * address, cs_arg *old_val, cs_arg *new_val) { 

    return __sync_bool_compare_and_swap((unsigned __int128 *)address,
                                        *(unsigned __int128 *)old_val,
                                        *(unsigned __int128 *)new_val);
}

void ShowIt(void *v)
{
   unsigned long long *ull = (unsigned long long *)v;
   printf("%p:%p", *ull, *(ull + 1));
}

int main()
{
   cs_arg oldval, newval;
   node n;

   memset(&oldval, 0, sizeof(oldval));
   memset(&newval, 0, sizeof(newval));
   memset(&n, 0, sizeof(node));

   n.mark = 3;
   newval.mark = 4;

   bool b;

   do {
      printf("If "); ShowIt(&n); printf(" is "); ShowIt(&oldval); printf(" change to "); ShowIt(&newval);
      b = cs3(&n, &oldval, &newval);
      printf(". Result %d\n", b);

      if (b)
         break;
      memcpy(&oldval, &n, sizeof(cs_arg));
   } while (1);  
}

When you exit the loop, oldval will be what was there before (has to be or the cas would have failed and we would have looped again) and newval will be what actually got written. Note that if this truly were multi-threaded, there is no guarantee that newval would be the same as the current contents of n, since another thread could already have come along and changed it again.

For output we get:

If 0000000000000000:0000000000000003 is 0000000000000000:0000000000000000 change to 0000000000000000:0000000000000000. Result 0
If 0000000000000000:0000000000000003 is 0000000000000000:0000000000000003 change to 0000000000000000:0000000000000000. Result 1

Note that the cas (correctly!) fails on the first attempt, since the 'old' value doesn't match the 'current' value.

While using assembler may be able to save you an instruction or two, the win in terms of readability, maintainability, portability, etc is almost certainly worth the cost.

If for some reason you must use inline asm, you will still need to re-order your structs, and the point about alignment still stands. You can also look at https://stackoverflow.com/a/37825052/2189500. It only uses 8 bytes, but the concepts are the same.

Community
  • 1
  • 1
David Wohlferd
  • 7,110
  • 2
  • 29
  • 56