6

I am in the process of using CPU dispatch based on processor features to switch implementation of a complicated numerical algorithm. I want to include the two versions (an sse2 and sse3 version for arguments sake) I am compiling in the same dynamic library.

The approach taken so far is to wrap all architecture specific code into a namespace e.g. namespace sse2 and namespace sse3 and thus avoiding duplicate symbol names when linking into the final dynamic library.

However, what happens if I use some code outside my control (e.g. a std::vector<int>) in both the sse2 and ss3 version. As far as I can see, the std::vector implementation will be present in both the sse2 and sse3 object files, but could in theory contain different instructions depending on the optimizations performed by the compiler. When I link these object files into the dynamic library, one of them will be used, and I risk potentially trying to run an sse3 instruction on a cpu only supporting sse2.

Aside from compiling to two separate dynamic libraries, what can be done to get around this problem? I need a solution working with both Visual Studio and clang on windows, mac os x and linux.

  • `std::vector` will be implemented in either a dynamic library or static library linked to your own object (e.g. in glibc). Your object files should only contain the declaration of `std::vector` not the implementation/definition. What's the problem? – Z boson May 29 '15 at 07:26
  • First of all, std::vector was just an example. I am also talking about other third party libraries which can be header only. Secondly, std::vector is a template so the code will be present in my own object files. – S. Gammelmark May 29 '15 at 08:15
  • [This may interest you](https://stackoverflow.com/questions/30320369/alias-of-a-function-template). If you're using a library from a header file make sure all the functions are static inline. That's what I do. – Z boson May 29 '15 at 08:40
  • That would work, if I was allowed to edit the third party headers. Or I could wrap them in an anonymous namespace etc. However, that is not an option. – S. Gammelmark May 29 '15 at 08:49

2 Answers2

0

One approach would be to dispatch at the shared-library level instead of the object-file level. This would require compiling the entire library multiple times with different instruction set support, then dispatching to the appropriate shared library at runtime based on the CPU capabilities that you detect. I detail an approach that can work for this on OS X and Linux in this previous answer. I have not tried to implement this on Windows (yet), however.

Community
  • 1
  • 1
Jason R
  • 11,159
  • 6
  • 50
  • 81
  • It is certainly an interesting solution you posted in the other answer. However, as I stated in the question, I would prefer to do it with only a single dynamic library. I know how to do it with multiple libraries, albeit in less clever way than you suggest, – S. Gammelmark May 28 '15 at 17:44
-3

This scenario is fully supported in the language and should not require any explicit handling. Your dynamic dispatch scenario doesn't have much to do with it - it is very typical for a project to instantiate std::vector in multiple translation units, and still ODR is not violated. In general, inline functions - and in particular template instantiations - are just not visible to the linker (i.e. do not appear as 'External' at the obj file tables).

If for some exotic reasons you would need to explicitly control this linkage type you'd have to resort to compiler-specific apparati. The MSVC apparatus is selectany, gcc has some other devices. Don't know about clang - but the point is you'd be hard pressed to come up with a reason to use any of them.

Community
  • 1
  • 1
Ofek Shilon
  • 14,734
  • 5
  • 67
  • 101
  • The issue that the OP is referring to isn't ODR-related. The problem is that the various instantiations of `std::vector`, for instance, aren't interchangeable in his scenario because they are compiled with differing instruction-set support. There's no guarantee that template instantiations will be inlined, so the linker has to deduplicate the various instantiations of a particular symbol. In the OP's case, there's no way for the linker to know which of the various instantiations would be acceptable for use, hence the problem. – Jason R May 31 '15 at 17:12
  • Exactly, JasonR. It seems that if I link with the smallest instruction set object files/library first those implementations are chosen which at least removes the problem of trying to execute an unsupported instruction. The removes any performance benefit from that part of the code however. – S. Gammelmark Jun 01 '15 at 06:59