I have a table with results from an optimization algorithm. I have 100 runs. X represents the time and is only stored when an improvement is stored. So I have missing x-es.
x1; y1 ; x2 ; y2
1 ; 100 ; 1 ; 150
4 ; 90 ; 2 ; 85
7 ; 85 ; 10 ; 60
10; 80 ;
This is just a csv. I am looking for a method to easily process this. As want to calculate averages at each x-value. So the average at x = 4, needs to take into account that for run 2, y at 4 is 85.
Any easy way to do this with excel. Or read it in in java or R? (I will be plotting the agerage with R's ggplot).
So the expected output would look like this:
x1; y1 ; x2 ; y2
1 ; 100 ; 1 ; 150
2 ; 100 ; 2 ; 85
4 ; 90 ; 4 ; 85
7 ; 85 ; 7 ; 85
10; 80 ;10 ; 60
--UPDATE
I have applied agstudy's answer below. This is my script:
library(ggplot2)
library(zoo)
data1 = read.table("rundata1", sep= " ", col.names=c("tm1","score1","current1"))
data2 = read.table("rundata1", sep= " ", col.names=c("tm2","score2","current2"))
newdata<- merge(data1[,1:2],data2[,1:2],by=1,all=T)
newdata <- newdata[!is.na(newdata$tm1),]
newdata$score1 <- zoo::na.locf(newdata$score1)
newdata$score2 <- zoo::na.locf(newdata$score2)
Almost working now. Only have an error:
newdata$score2 <- zoo::na.locf(newdata$score2)
Error in `$<-.data.frame`(`*tmp*`, "score2", value = c(40152.6, 40152.6, :
replacement has 11767 rows, data has 11768