0

I have a table with results from an optimization algorithm. I have 100 runs. X represents the time and is only stored when an improvement is stored. So I have missing x-es.

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;

This is just a csv. I am looking for a method to easily process this. As want to calculate averages at each x-value. So the average at x = 4, needs to take into account that for run 2, y at 4 is 85.

Any easy way to do this with excel. Or read it in in java or R? (I will be plotting the agerage with R's ggplot).

So the expected output would look like this:

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
2 ; 100 ; 2  ; 85
4 ; 90  ; 4  ; 85
7 ; 85  ; 7  ; 85
10; 80  ;10 ; 60

--UPDATE

I have applied agstudy's answer below. This is my script:

library(ggplot2)
 library(zoo)

data1 = read.table("rundata1", sep= " ", col.names=c("tm1","score1","current1"))
data2 = read.table("rundata1", sep= " ", col.names=c("tm2","score2","current2"))

newdata<- merge(data1[,1:2],data2[,1:2],by=1,all=T)
newdata <- newdata[!is.na(newdata$tm1),]
newdata$score1 <- zoo::na.locf(newdata$score1)
newdata$score2 <- zoo::na.locf(newdata$score2)

Almost working now. Only have an error:

newdata$score2 <- zoo::na.locf(newdata$score2)
Error in `$<-.data.frame`(`*tmp*`, "score2", value = c(40152.6, 40152.6,  : 
  replacement has 11767 rows, data has 11768
pnuts
  • 58,317
  • 11
  • 87
  • 139
dorien
  • 5,265
  • 10
  • 57
  • 116

2 Answers2

2

For example, in R you can do this in 2 steps. First you merge your 2 runs, then you fill the missing values with the last no missing. I am using na.locf from the zoo package for this.

xx <- read.table(text='x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;',sep=';',fill=TRUE,header=TRUE)

dm <- merge(xx[,1:2],xx[,3:4],by=1,all=T)
dm <- dm[!is.na(dm$x1),]
dm$y1 <- zoo::na.locf(dm$y1)
dm$y2 <- zoo::na.locf(dm$y2)
dm
  x1  y1  y2
1  1 100 150
2  2 100  85
3  4  90  85
4  7  85  85
5 10  80  60
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I really like your approach, yet, some of the x2 datapoints seem to be missing. Could that be? I will put my code in the question. – dorien Feb 25 '14 at 16:14
  • @dorien i don't think this can happen sice I am using `merge` with `all=TRUE` option. – agstudy Feb 25 '14 at 16:17
  • Curious, I guess I am missing something simple in the script above. Your example works perfectly in contrast. – dorien Feb 25 '14 at 16:19
  • Ok, I had a very silly input error, was reading the same file twice! Now I only have this error: newdata$score2 <- zoo::na.locf(newdata$score2) Error in `$<-.data.frame`(`*tmp*`, "score2", value = c(40152.6, 40152.6, : replacement has 11767 rows, data has 11768 – dorien Feb 25 '14 at 16:34
  • Perhaps because there is an empty line at the back? – dorien Feb 25 '14 at 16:37
  • I have moved the error discussion to a new question. Your suggestiong of using this helped me though and I am sure it will work once I get through the errors: http://stackoverflow.com/questions/22038887/r-merging-data-error-in-fix-byby-x-x – dorien Feb 26 '14 at 10:43
1

With Excel you might use VLOOKUP if first you have a column of all unique sorted x values ascending (one column seems sufficient?) and for y something like:

=VLOOKUP($F2,A:B,2)  

(=VLOOKUP($F2,C:D,2) for y2) each copied down to suit.

SO21912834 first example

Alternatively, if you are prepared to change your source data layout along the lines shown* then you might use a PivotTable, copy that with Paste Special…, Values, put something in the cell immediately to the right of Values, select the y1 and y2 columns and Go To Special, Blanks, =, Up, Ctrl+Enter.

* or remove the numbers from the x labels and use multiple consolidation ranges instead.

SO21912834 second example

pnuts
  • 58,317
  • 11
  • 87
  • 139