4

I have parallelized an already existing code for computer vision applications using OpenMP. I think that I well designed it because:

  • The workload is well-balanced
  • There is no synchronization/locking mechanism
  • I parallelized the outer most loops
  • All the cores are used for most of the time (there are no idle cores)
  • There is enough work for each thread

Now, the application doesn't scale when using many cores, e.g. it doesn't scale well after 15 cores.

The code uses external libraries (i.e. OpenCV and IPP) where the code is already optimized and vectorized, while I manually vectorized some portions of the code as best as I could. However, according to Intel Advisor, the code isn't well vectorized, but there is no much left to do: I already vectorized the code where I could and I can't improve the external libraries.

So my question is: is it possible that vectorization is the reason why the code doesn't scale well at some point? If so, why?

justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
  • 5
    Have you calculated how much memory bandwidth is being used? If you are saturating the memory bus, more cores won't help. Refactoring could; if your data is going back and forth to memory in multiple passes, doing *more per pass* and thus *fewer passes* could get rid of the bottleneck. – Yakk - Adam Nevraumont May 26 '17 at 13:48
  • @Yakk Thanks for your answer, I didn't consider that. Do you know how can I do such an analysis by using some Intel tool (like Intel Advisor/VTune Amplifier etc.)? – justHelloWorld May 26 '17 at 13:56
  • Nope, I am unaware of some easy tool to detect memory bandwidth use. Usually I just back-of-envelope calculate it when I notice symptoms. – Yakk - Adam Nevraumont May 26 '17 at 14:01

1 Answers1

0

In line with comments from Adam Nevraumont, VTune Amplifier can do a lot to pinpoint memory bandwidth issues: https://software.intel.com/en-us/vtune-amplifier-help-memory-access-analysis.

It may be useful to start at a higher level of analysis than that though, like looking at hot spots. If it turns out that most of your time is spent in OpenCV or similar like you're concerned about, finding that out early might save some time vs. digging into memory bottlenecks directly.

Aaron Altman
  • 1,705
  • 1
  • 14
  • 22