Breaking strict aliasing and living to tell about it?

Question

I am trying to use two libraries, LIBSVM and LIBLINEAR in the same application that I am writing in C++11. Both LIBSVM and LIBLINEAR take their input in what is essentially a row-based sparse matrix representation: there is node structure

struct svm_node
{
    int index;
    double value;
};

and the sparse matrix itself is just struct svm_node **, where every row is a struct svm_node *, and rows are terminated by index = -1. The LIBLINEAR version of this struct is called feature_node and has identical definition. Although LIBSVM and LIBLINEAR are written by the same authors, svm.h and linear.h, and consequently struct svm_node and struct feature_node are in no way related.

There are some cases where I would like to create a kernel SVM model (implemented by LIBSVM only) and a logistic regression model (implemented by LIBLINEAR) only) of my data. The data set, which is passed to the libraries in their respective---on the binary level, identical---sparse matrix representation, may be quite large and I would prefer to avoid memcpy()ing it all. A simple reinterpret_cast<feature_node **>(svm_node_ptr_ptr_variable) seems to do the job just fine.

I am also using LLVM's full-program optimization (-flto) in release builds, so I would like to ensure no optimization breaks by code in an unpredictable manner.

Is there any way type-pun svm_node ** into feature_node ** that avoids any breakage which may be caused by (current or future) compiler optimizations? Does __attribute__((__may_alias__)) help here, and if it does, how should I use it?

If __attribute__((__may_alias__)) is only meaningful on types, would it work if I created my own struct and pointer-to-struct

struct __attribute__((__may_alias__)) SparseElement {
    int index;
    double value;
};
typedef SparseRow SparseElement * __attribute__((__may_alias__));

and then passed a retinterpret_casted SparseRow * to LIBSVM and LIBLINEAR?

Since the definitions are identical, why you don't erase one definition and then `typedef` with the other. — 101010, Aug 05 '14 at 10:31
Is it the pointer, or the structure itself, that you want to type-pun? `svn_node **` and `feature_node **` are incompatible and should not be aliased. `__attribute__((__may_alias__))` can't help because it applies to type definitions. However, if it really is the pointers that you want to alias, just use `memcpy` - copying a single pointer is not expensive and the compiler will probably optimise it out anyway. — davmac, Aug 05 '14 at 10:31
@ikh Yes, but it disables many optimizations altogether that I would rather prefer to have for the rest of the code... — Kristóf Marussy, Aug 05 '14 at 10:31
@40two The definitions are in library header files. As a last resort, I could maybe edit the headers, but that's ugly (although maybe not as ugly as some compiler-magic and type punning). — Kristóf Marussy, Aug 05 '14 at 10:33
@ikh Well it's C nomenclature, perhaps not C++, but the rules are as far as I understand it the same. See C99 6.5p7. Newer C standards have the same clause. — davmac, Aug 05 '14 at 10:33
@davmac Both libraries access the data pointed by the pointers, so I want to alias both the `struct svm_node *` pointers pointed by the `struct svm_node **` and the `struct svm_node` pointed by them, I guess. — Kristóf Marussy, Aug 05 '14 at 10:35
@KristófMarussy in that case I'm afraid you're short of luck. If they were used in separate modules I'd say "go ahead and it will just work(TM)" but using -flto potentially exposes aliasing violations across module boundaries. — davmac, Aug 05 '14 at 10:37
I am planning to add some `static_assert`s to ensure that the build breaks on a machine where the installed LIBSVM and LIBLINEAR have different sparse matrix representations (which is extremely unlikely), but that of course does not help when a compiler optimalizations breaks the code because of aliasing. — Kristóf Marussy, Aug 05 '14 at 10:37
At least according to http://stackoverflow.com/a/2981867/560450 , there is no guarantee of that working either. And I think LIBSVM's headers should have something like `volatile svm_node * volatile *`, which they don't. — Kristóf Marussy, Aug 05 '14 at 15:54

score 3 · Accepted Answer · answered Aug 05 '14 at 16:25

The LIBLINEAR version of this struct is called feature_node and has identical definition.

You're golden if you use a union. C++ specifically allows (section 9.2) accessing "a common initial subsequence".

If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

Even a reinterpret_cast on the pointer should work fine, since the type that undergoes lvalue to rvalue conversion is the exact type of the object that exists in memory there.

My problem is that the aliasing I want has two levels of indirection, i.e. if I have a `union my_node { svm_node s; feature_node f; };`, I will have to pass `my_node **` as both `svm_node **` and *feature_node **`. Does this passage apply still? — Kristóf Marussy, Aug 05 '14 at 16:32
@KristófMarussy: You're not breaking strict aliasing here. See [the actual rule](http://stackoverflow.com/a/7005988/103167). Your type IS similar to (identical to, actually) the dynamic type of the object, and you perform the access via an aggregate type that includes the dynamic type of the object among its non-static data members. So all is ok. — Ben Voigt, Aug 05 '14 at 16:36

Breaking strict aliasing and living to tell about it?

1 Answers1