Firstly, see this answer for some general information about the difficulties of benchmarking in Prolog, or any programming language for that matter. The answer concerns the ECLiPSe language which uses Prolog internally so you'll be familiar with the syntax.
Now, let's look at a simple example:
equal_to_one(X) :- X =:= 1.
If we trace the execution (which by the way is a great way to better understand how Prolog works), we get:
?- trace, foo(1).
Call: (7) foo(1) ? creep
Call: (8) 1=:=1 ? creep
Exit: (8) 1=:=1 ? creep
Exit: (7) foo(1) ? creep
Notice the two calls and two exits occurring in the trace. In the first call, foo(1) is matched with defined facts/rules in the Prolog file and successfully finds foo/1, whereafter in the second call, the body is (successfully) executed. Subsequently the two exits simply represent exiting out of the statements that were true (both calls).
When we run our program with time/1, we see:
?- time(foo(1)).
% 2 inferences, 0.000 CPU in 0.000 seconds (86% CPU, 69691 Lips)
true.
?- time(foo(2)).
% 2 inferences, 0.000 CPU in 0.000 seconds (82% CPU, 77247 Lips)
false.
Both queries need 2 (logical) inferences to complete. These inferences represent the calls described above (i.e. the program 'tries to match' something twice, it doesn't matter whether the number is equal to one or not). It is because of this that inferences are a good indication of the performance of your program, being not based on any hardware-specific properties, but rather on the complexity of your algorithm(s).
Furthermore we see CPU and seconds, which respectively represent cpu-time and overall clock-time spent while executing the program (see referred SO answer for more information).
Finally, we see a different % CPU and LIPS for each execution. You shouldn't worry too much about these numbers as they represent the percentage CPU used and average Logical Inferences Per Second made and for obvious reasons these will always differ for each execution.
PS : a similar SO question can be found here