I am trying to investigate the performance of my program, whereas cache misses is a huge bottleneck. For testing purposes, before implementing PAPI
in to the target application, I needed to verify how stuff works, which is why I posted a sample program.
My intention is to use PAPI
for monitoring the cache misses of a separate thread. I am trying to use the PAPI_attach
to apply my event sets to the specific thread ID, however, the cache misses which I measure are still the same (or at least VERY similar) for when not using the PAPI_attach
.
Another experiment I did to verify my concerns was to start the Firefox browser during a run of this very simple program. This let to an increased amount of measured cache misses, so obviously, something is very strange regarding the PAPI_attach
function and how I am using it.
Using the below code for my thread worker:
void * Slave(void * args)
{
int rc = 0;
int tmp, i, j;
/*must be initialized to PAPI_NULL before calling PAPI_create_event*/
int EventSet = PAPI_NULL;
long long values[NUM_EVENTS];
/*This is where we store the values we read from the eventset */
/* We use number to keep track of the number of events in the EventSet */
int retval, number;
pid_t tid;
tid = syscall(SYS_gettid);
char errstring[PAPI_MAX_STR_LEN];
/* get the number of events in the event set */
number = 0;
printf("My pid is: %d\n", tid);
if ( (retval=PAPI_register_thread())!= PAPI_OK )
ERROR_RETURN(retval);
if ( (retval = PAPI_create_eventset(&EventSet)) != PAPI_OK)
ERROR_RETURN(retval);
/* Add Total Instructions Executed to the EventSet */
if ( (retval = PAPI_add_event(EventSet, PAPI_L1_TCM)) != PAPI_OK)
ERROR_RETURN(retval);
/* Add Total Cycles event to the EventSet */
if ( (retval = PAPI_add_event(EventSet, PAPI_L2_TCM)) != PAPI_OK)
ERROR_RETURN(retval);
if ( (retval = PAPI_add_event(EventSet, PAPI_L3_TCM)) != PAPI_OK)
ERROR_RETURN(retval);
number = 0;
if ( (retval = PAPI_list_events(EventSet, NULL, &number)) != PAPI_OK)
ERROR_RETURN(retval);
printf("There are %d events in the event set\n", (unsigned int)number);
if ((retval = PAPI_attach(EventSet, tid)) != PAPI_OK)
ERROR_RETURN(retval);
/* Start counting */
if ( (retval = PAPI_start(EventSet)) != PAPI_OK)
ERROR_RETURN(retval);
/* you can replace your code here */
tmp=0;
for (i = 0; i < 200000000; i++)
{
tmp = i + tmp;
}
if ( (retval=PAPI_read(EventSet, values)) != PAPI_OK)
ERROR_RETURN(retval);
printf("L1 misses %lld \n", values[0] );
printf("L2 misses %lld \n",values[1]);
printf("L3 misses %lld \n",values[2]);
if ( (retval = PAPI_stop(EventSet, values)) != PAPI_OK)
ERROR_RETURN(retval);
/* free the resources used by PAPI */
PAPI_shutdown();
}
And the following code for spawning the thread:
int main()
{
pthread_t master;
pthread_t slave1;
pthread_attr_t attr;
int rc = 0;
int retval, number;
unsigned long pid;
pid = PAPI_thread_id();
char errstring[PAPI_MAX_STR_LEN];
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT )
ERROR_RETURN(retval);
if ((retval = PAPI_thread_init(&pthread_self)) != PAPI_OK)
ERROR_RETURN(retval);
rc = pthread_create(&slave1, &attr, Slave, NULL);
pthread_join(slave1, NULL);
exit(0);
}
The bad thing is that i get no errors, which indicate that everything is working.