Call libmvec functions manually on __m128 vectors?

Question

According to this page https://sourceware.org/glibc/wiki/libmvec, I should be able to manually vectorize a few complicated instructions like cosine by using the libmvec functions. However, I don't know how to get gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 to recognize them. Am I missing some compiler flags or something? Any help or suggestions are appreciated.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <math.h>
#include <immintrin.h>

// gcc libmvectest.c -o libmvectest.bin -lm -O3  -Wall -ffast-math -march=msse4


int main(int argc, char **argv)
{
  float input = 0.1;
  float res[4];
  __m128  m128 = _mm_set1_ps(input);
  __m128  cosm128 = _ZGVbN4ua16vl_cosf(m128);
  _mm_storeu_ps(res, cosm128);
  printf("%.8f %.8f\n", cosf(input), res[0]);
}

I've googled the 'implicit declaration...' error for the prefixed functions but failed to find an answer that worked for me. I tried _ZGVbN4ua16vl_cos, _ZGVbN4ua16vl_cosf and other attempts. Does anyone know where the actual function names are listed?

`_ZGVbN4ua16vl_cosf` looks line a mangled C++ function name. Are you sure this is the correct function to call? — pmacfarlane, May 25 '23 at 22:46
No, I'm not sure. I started with _ZGVbN4v_cosf() as I found that on this post https://stackoverflow.com/questions/40475140/mathematical-functions-for-simd-registers but that isn't recognized either. — Simon Goater, May 25 '23 at 22:48
That also looks like a mangled C++ function name. Does it work if you just use `cosf()`? — pmacfarlane, May 25 '23 at 22:49
Where are you getting these weird function names from anyway? I don't see that on the website you linked. No, you should not need to compile as C++. — pmacfarlane, May 25 '23 at 22:51
If you follow the VectorABI.txt link, it describes the function names. — Simon Goater, May 25 '23 at 22:53
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt — 0___________, May 25 '23 at 22:54
Have you tried `extern __m128 _ZGVbN4v_cosf(__m128);`? That probably works, but hopefully there's a better way. I'm not sure if GCC / glibc provides an intended way to call them manually. It will use it automatically when auto-vectorizing with `-ffast-math`, though: https://godbolt.org/z/dhP45cahz . The doc you're reading is about the A**B**I, application *binary* interface, that's why it's describing asm symbol names, not C functions you're intended to use in your source. — Peter Cordes, May 26 '23 at 05:02
Remember, post answers as answers, separate from the question. — Peter Cordes, May 26 '23 at 08:59

score 1 · Answer 1 · answered May 26 '23 at 08:24

Those OpenMP-SIMD clones are not intended to be invoked by hand from portable C. They are not declared as separate functions; instead, they are introduced as multiple variants of a standard function via #pragma omp declare simd or __attribute__((simd)) for use by automatic vectorization. On a Glibc-based system you can inspect the declarations in /usr/include/bits/math-vector.h.

For use in explicitly vectorized code you can write simple wrappers with a trivially vectorizable loop like this:

#include <immintrin.h>

__attribute__((simd("notinbranch")))
float cosf(float);

__m128 t(__m128 x)
{
    for (int i = 0; i < sizeof x / sizeof x[0]; i++)
        x[i] = cosf(x[i]);
    return x;
}

which gcc -O2 -ftree-vectorize is able to optimize to

t:
        jmp     _ZGVbN4v_cosf

https://godbolt.org/z/jdfeoT7K5

Note that `x[i]` for `__m128` is a GNU C extension, but all the mainstream compilers that support GNU extensions define `__m128` as a vector of `float` elements so it does work. So if you're making a portable fallback version of this, it's not just the `__attribute__` part that needs `#ifdef __GNUC__`, it's also the loop in `t()`. — Peter Cordes, May 26 '23 at 08:29

score 1 · Answer 2 · answered May 26 '23 at 09:02

Thanks to all who looked into this. The main point of the question was to manually vectorize using those built in functions so as to not rely on the optimiser which can easily get confused if you're doing more than just a simple one function loop. The suggestion by Peter Cordes worked on my system so this code now compiles and runs successfully.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <math.h>
#include <immintrin.h>

// gcc libmvectest.c -o libmvectest.bin -lm -O3  -Wall -ffast-math -march=msse4

extern __m128  _ZGVbN4v_cosf(__m128);

int main(int argc, char **argv)
{
  float input = 0.1;
  float res[4];
  __m128  m128 = _mm_set1_ps(input);
  __m128  cosm128 = _ZGVbN4v_cosf(m128);
  _mm_storeu_ps(res, cosm128);
  printf("%.8f %.8f\n", cosf(input), res[0]);
}

One way to make this less ugly and hacky: `extern __m128 vec128_cosf(__m128) asm("_ZGVbN4v_cosf");` directly tells the compiler the ABI symbol name, rather than just using it as a C function name on a system where C names are used unchanged (Linux ELF). https://godbolt.org/z/PhPz8qv14 / https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html . Or pick whatever C name you want for the function, like `mvec_cosf128` or whatever. Although other libraries other than libmvec can provide vectorized versions of library functions, that's why the ABI is documented. — Peter Cordes, May 26 '23 at 09:04

Gal Weiss · Answer 3 · 2023-05-25T23:55:08.373

The implicit declaration error is what you get when you use a function name that doesn't exist in any of your include files (or anything included by them).

As far as I understand from the article, the libmvec is an alternative implementation, but the functions themselves are coming from math.h, so you should keep calling the existing cos() and sin() functions. The name you specified (_ZGVbN4ua16vl_cosf) is not something you should call directly but an internal vectorized alternative that will replace the loop in your code (I must admit I have no idea how that works, but this is the optimizer's work)

If you did so well, the compiler / optimizer would automatically bring in the optimized (vectorized) version of the math function.

here is a way to see that: Let's take the following code from the article (Link to article)

#include <math.h>

int N = 3200;
double b[3200];
double a[3200];

int main (void)
{
  int i;

  for (i = 0; i < N; i += 1)
  {
    b[i] = sin (a[i]);
  }

  return (0);
}

name this file test_math.c. now, if you compile it normally:

gcc ./test_math.c -lm

You can now use the nm command (which lists all symbols in a binary file) to check which sin symbols are compiled:

#nm -a a.out  | grep sin

  U sin@@GLIBC_2.2.5

The "U" hints that a symbol exists in the code but is undefined, this is because the binary file uses dynamic linking to the math library and it will be provided only when the binary file is running.

Now, you can compile the same file but this time using the optimizations (which implicitly brings in the libmvec library), and check again:

#gcc ./test_math.c -O1 -ftree-loop-vectorize -ffast-math -lm -mavx
#nm -a a.out  | grep sin
                 U _ZGVcN4v_sin@@GLIBC_2.22
                 U sin@@GLIBC_2.2.5

Now you see that the binary file a.out is using the _ZGVcN4v_sin which is the optimized (vectorized) variant function. although you didn't mention it in the code.

Another tip, if you are using linux and just want to know how to use a math function you can just run the following command (as an example):

#man sin

SIN(3)                                        Linux Programmer's Manual                                        

NAME
       sin, sinf, sinl - sine function

SYNOPSIS
       #include <math.h>

       double sin(double x);
       float sinf(float x);
       long double sinl(long double x);

       Link with -lm.

This is just a snippet from the output, but you can see the the manual section you are reading is marked as SIN(3). The "3" means that this is a documentation of the LibC you are using. You can also see which include file to use and how to link the library. (add -lm to your compilation command)

The point of the question is really how to call the libmvec function for a vector of 4 floats. We know these functions exist, and GCC can use them when *auto*-vectorizing a loop that does `a[i] = cosf(b[i])` (at least with `-ffast-math`: https://godbolt.org/z/jvcon843j), to get code that's maybe 4x faster than calling scalar `cosf` for each float separately. The OP probably knows that they could do `foo = cosf(bar)` in standard portable C, but they're trying to vectorize with SSE intrinsics and types like `__m128` (a vector of 4 floats). — Peter Cordes, May 26 '23 at 04:56

Call libmvec functions manually on __m128 vectors?

3 Answers3