Questions tagged [intel-vtune]

Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.

182 questions
65
votes
2 answers

Performance difference between Windows and Linux using Intel compiler: looking at the assembly

I am running a program on both Windows and Linux (x86-64). It has been compiled with the same compiler (Intel Parallel Studio XE 2017) with the same options, and the Windows version is 3 times faster than the Linux one. The culprit is a call to…
InsideLoop
  • 6,063
  • 2
  • 28
  • 55
18
votes
1 answer

Pthread Mutex: pthread_mutex_unlock() consumes lots of time

I wrote a multi-thread program with pthread, using the producer-consumer model. When I use Intel VTune profiler to profile my program, I found the producer and consumer spend lots of time on pthread_mutex_unlock. I don't understand why this…
lei_z
  • 1,049
  • 2
  • 13
  • 27
15
votes
6 answers

Using C/Intel assembly, what is the fastest way to test if a 128-byte memory block contains all zeros?

Continuing on from my first question, I am trying to optimize a memory hotspot found via VTune profiling a 64-bit C program. In particular, I'd like to find the fastest way to test if a 128-byte block of memory contains all zeros. You may assume any…
14
votes
4 answers

How to profile time spent in memory access in C/C++ applications?

Total Time spent by a function in an application can be broadly divided in to two components: Time spent on actual computation (Tcomp) Time spent on memory accesses (Tmem) Typically profilers provide an estimate of the total time spent by a…
Imran
  • 642
  • 6
  • 25
10
votes
2 answers

What might cause the same SSE code to run a few times slower in the same function?

Edit 3: The images are links to the full-size versions. Sorry for the pictures-of-text, but the graphs would be hard to copy/paste into a text table. I have the following VTune profile for a program compiled with icc --std=c++14 -qopenmp -axS -O3…
iksemyonov
  • 4,106
  • 1
  • 22
  • 42
8
votes
4 answers

Profiling help required

I have a profiling issue - imagine I have the following code... void main() { well_written_function(); badly_written_function(); } void well_written_function() { for (a small number) { highly_optimised_subroutine(); …
Mick
  • 8,284
  • 22
  • 81
  • 173
8
votes
3 answers

Is VTune Worth Considering for Delphi?

Running through all the questions on profiling tools, I was surprised to discover VTune by Intel that I hadn't heard of before. At $700, it is even more expensive than AQTime. But before I make the decision to put down the big bucks for AQTime, has…
lkessler
  • 19,819
  • 36
  • 132
  • 203
7
votes
2 answers

Optimzing SSE-code

I'm currently developing a C-module for a Java-application that needs some performance improvements (see Improving performance of network coding-encoding for a background). I've tried to optimize the code using SSE-intrinsics and it executes…
Yrlec
  • 3,401
  • 6
  • 39
  • 75
7
votes
1 answer

VTune Profiler giving Error: "The Data Cannot be displayed,there is no viewpoint available for data "

I want to optimize my code which is written in c++ on linux platform.For that i am using Intel VTune Performance Analyzer Profiler .When i am identifying Hotspots , it successfully runs the binary executable whose path i have specified and then it…
Jasdeep Singh Arora
  • 543
  • 2
  • 11
  • 31
6
votes
2 answers

MKL Performance on Intel Phi

I have a routine that performs a few MKL calls on small matrices (50-100 x 1000 elements) to fit a model, which I then call for different models. In pseudo-code: double doModelFit(int model, ...) { ... while( !done ) { cblas_dgemm(...); …
Andrew
  • 867
  • 7
  • 20
6
votes
1 answer

Hotspot in a for loop

I am trying to optimize this code. static lvh_distance levenshtein_distance( const std::string & s1, const std::string & s2 ) { const size_t len1 = s1.size(), len2 = s2.size(); std::vector col( len2+1 ), prevCol( len2+1 ); …
qdii
  • 12,505
  • 10
  • 59
  • 116
5
votes
3 answers

Vtune report Outside any known module

I am using Intel(R) VTune(TM) Amplifier XE 2013 Update 5 (build 274450) for my linux application hotspot collect, but the report says the "[Outside any known module]" consume most of the time, so i want to get more info about the unknow module. when…
4
votes
1 answer

When profiling, most of the time is spent in nvoglv64.dll. What should I deduce?

I am profiling a C++ application with Intel VTune Amplifier. Most of the time seems to be spent in nvoglv64.dll more precisely in DrvPresentBuffers and/or KeSynchoronizeExecution. Note that I have a NVIDA GeoForce graphic card. I am new to the…
Palmira
  • 121
  • 3
  • 11
4
votes
1 answer

system_call_after_swapgs, where is my code spending most of the time?

I am trying to profile my code with intel Vtune. When looking at the function call stack it looks like most of the time is spent on a function called system_call_after_swapgs. However there is no stack information. My question is: what is…
Manfredo
  • 1,760
  • 4
  • 25
  • 53
4
votes
1 answer

What is _kmp_fork_barrier and how to see if there is load imbalance?

I'm using Intel VTune Amplifier to see how my parallel application scales. Notice I don't use any explicit lock mechanism It scales pretty well on my 4-cores laptop (considering that there are portions of the algorithm that can't be…
1
2 3
11 12