0

I have an application, developed in c++, in visual studio, on windows, with an Intel CPU.

This application is in use on multiple machines, at multiple locations, all with Intel CPUs.

Lately, it was installed on a PC with an AMD CPU.

On the AMD machine, on a certain function, the application freezes and crashes.

The function that crashes uses boost thread locking, and then standard opencv functionality (specifically the aruco fiducial marker module), so I assume that the issue is the thread locks.

the relevant code is:

//header

typedef boost::shared_mutex Lock;
typedef boost::unique_lock< Lock > WriteLock;
typedef boost::shared_lock< Lock > ReadLock;
Lock floorLock;

//thread one (Producer)

WriteLock f_lock2(floorLock);
frameFloor = image_ocv.clone(); //an opencv::Mat 
f_lock2.unlock();

//thread two (consumer)

cv::Mat image_ocv;
ReadLock f_lock2(floorLock);
image_ocv = frameFloor;
f_lock2.unlock();

I have tried swapping these out for thead-safe queues, and the crash persists.

Another third party has now tested this on different machines, and confirms the behaviour. The only difference between machines where this runs fine, and machines where it crashes, is the Intel vs AMD CPU.

Sadly I do not have any AMD machines, so I am having trouble debugging this.

Is there any reason why code compiled on an Intel CPU would crash on an AMD? What can I look at to fix this?

anti
  • 3,011
  • 7
  • 36
  • 86
  • 4
    Does the opencv functionality use some Intel specific AVX (512) instructions? – Bob__ Oct 30 '20 at 10:02
  • Hi @Bob__, thanks for your reply. Great thought! The code uses opencv::aruco module, and Eigen. I will look into this and see if I can find anything. – anti Oct 30 '20 at 12:14
  • The opencv cmake settings are: `CPU_BASELINE : SSE3, CPU_DISPATCH : SSE4_1;SSE4_2;AVX;FP16;AVX2;AVX512_SKX` Would this possibly cause the issue? – anti Oct 30 '20 at 12:18
  • 1
    To my knowledge, Ryzen processors (if those are the "AMD CPU"s you have) support up to AVX2, not AVX512. So yeah, it could be the cause of the issue, but I won't pretend to be an expert on the subject. – Bob__ Oct 30 '20 at 14:00
  • No AMD CPUs support AVX512 at all, but `CPU_DISPATCH` should only use that feature if available. BTW, strange that you wouldn't include FMA or FMA3 in your CPU_DISPATCH, unless OpenCV includes that with AVX2. Is your AMD CPU really ancient, like so old that it doesn't support SSE3? That would be surprising, I think even later K10 / Barcelona CPUs supported SSE3. (SSSE3 (new in Core 2; mostly integer stuff) wasn't available in AMD until Bulldozer, but that's different from SSE3). – Peter Cordes Oct 30 '20 at 23:47
  • Did you maybe compile with `-march=native` separate from what you used with OpenCV? What specific AMD CPU did you have? It's likely not really Intel vs. AMD, and would have crashed on an older Intel without some ISA extensions. Especially if you can confirm that it was an illegal-instruction crash. (`#UD` hardware exception. Linux SIGILL, or whatever Windows does when that happens. Same as running a `ud2` instruction if you want to test that.) – Peter Cordes Oct 31 '20 at 04:53
  • HI @PeterCordes, Thanks for your reply. The AMD cpu that crashes is a threadripper Ryzen, not an old model. I do not have the AMD machine here, so this is very hard to debug! I have had the customer run some tests, and it does look like the opencv functions that cause the crash (only on AMD machines). Do you have any thoughts on what else i can try here? – anti Oct 31 '20 at 09:25
  • 2
    Try it on an Intel CPU *without* AVX512, like Skylake-client, or Haswell. If it crashes there, then you've included some AVX-512 instructions that run unconditionally, not just on CPUs with AVX-512. If that reproduces the crash, there you go. Or if it's more convenient, you can use SDE on any CPU to emulate a non-AVX-512 CPU. Like the opposite of [How to test AVX-512 instructions w/o supported hardware?](https://stackoverflow.com/a/51805258). Or like [Disabling AVX2 in CPU for testing purposes](https://stackoverflow.com/q/55762372) (but of course only disable AVX512; Zen has AVX2) – Peter Cordes Oct 31 '20 at 09:40
  • Hi @PeterCordes, I have rebuilt opencv without AVX512, and I see the same crash. Is there anything else you can think of that may be causing this? Thank you – anti Nov 03 '20 at 15:39
  • Ordinary bugs, maybe? But you're still saying you have a crash you can't repro with Intel CPUs, though? Could one be using GPU accel and the other not? – Peter Cordes Nov 03 '20 at 23:32
  • Exactly. opencv functions run perfectly on Intel CPUs, and crash on an AMD threadripper machine. No other difference in hardware or software. Could it be openMP? I see there are `parallel_for_` loops in the code – anti Nov 04 '20 at 09:22
  • OpenMP usage in more complex thread context scenarios can sometimes become quite tricky, I observed several issues for MSCPPUnit-Tests here for instance. From your description, I really recommend to first try to exclude any non-architecture based issues as far as possible. Did you try to replace the locks with standard ones? Did you try to reproduce the crash in effectively non-blocking scenarios? But in doubt, I guess you cannot get around trying to reproduce it on an AMD machine. If OpenMP is involved, the cache-behavior is surely different between architectures in doubt. – Secundi Dec 03 '20 at 08:36

0 Answers0