I would like to use R for plotting performance evaluation results of distinct DB systems. For each system I loaded the same data and execute the same queries in several iterations.
The data for a single systems looks like this:
"iteration", "lines", "loadTime", "query1", "query2", "query3"
1, 100000, 120.4, 0.5, 6.4, 1.2
1, 100000, 110.1, 0.1, 5.2, 2.1
1, 50000, 130.3, 0.2, 4.3, 2.2
2, 100000, 120.4, 0.1, 2.4, 1.2
2, 100000, 300.2, 0.2, 4.5, 1.4
2, 50000, 235.3, 0.4, 4.2, 0.5
3, 100000, 233.5, 0.7, 8.3, 6.7
3, 100000, 300.1, 0.9, 0.5, 4.4
3, 50000, 100.2, 0.4, 9.2, 1.2
What I need now (for plotting) is a matrix or data frame containing the average of these measurements.
At the moment I am doing this:
# read the file
all_results <- read.csv(file="file.csv", head=TRUE, sep=",")
# split the results by iteration
results <- split(all_results, all_results$iteration)
# convert each result into a data frane
r1 = as.data.frame(results[1])
r2 = as.data.frame(results[2])
r3 = as.data.frame(results[3])
# calculate the average
(r1 + r2 +r3) / 3
I could put all this into a function and calculate the average matrix in a for loop, but I have the vague feeling that there must be a more elegant solution. Any ideas?
What can I do for cases when I have incomplete results, e.g., when one iteration has less rows than the others?
Thanks!