0

I'm using data in a format shown: Actual data set much longer. Column labels are: Date | Variable 1 | Variable 2 | Failed ?

I'm sorting the data into date order. Some dates may be missing, but an ordering function should be sorting this out. From there, I'm trying to split the data into sets where new sets are denoted by the far right column registering a 1. I'm then trying to plot these sets on a single graph with number of days passed on the x-axis. I've looked into using the ggplot function, but it seems to require frames where the length of each vector is already known. I tried creating a matrix of a length based on the maximum number of days that passed for all sets and then fill the spare cells with NaN values to be plotted, but this took ages as my data set is quite large. I was wondering whether there was a more elegant way of plotting the values against days past for all sets on a single graph, and then iterate the process for additional variables.
Any help would be much appreciated.

Code for a reproducible example is included here:

test <-matrix(c(
"01/03/1997",   0.521583294,    0.315170092,    0,
"02/03/1997",   0.63946859, 0.270870821,    0,
"03/03/1997",   0.698687101,    0.253495021,    0,
"04/03/1997",   0.828754157,    0.233024574,    0,
"05/03/1997",   0.87078867, 0.214507537,    0,
"06/03/1997",   0.883279874,    0.212268627,    0,
"07/03/1997",   0.952083969,    0.062663598,    0,
"08/03/1997",   0.991100195,    0.054875256,    0,
"09/03/1997",   0.992490126,    0.026610776,    1,
"10/03/1997",   0.020707391,    0.866874513,    0,
"11/03/1997",   0.32405139, 0.778696984,    0,
"12/03/1997",   0.32665243, 0.703234151,    0,
"13/03/1997",   0.603941956,    0.362869647,    0,
"14/03/1997",   0.944046386,    0.026992527,    1,
"15/03/1997",   0.108246142,    0.939363715,    0,
"16/03/1997",   0.152195386,    0.907458966,    0,
"17/03/1997",   0.285748169,    0.765212667,    0), ncol = 4, byrow=TRUE)
colnames(test) <- c("Date", "Variable 1", "Variable 2", "Failed")
test <-as.table(test)
test
AlwaysInTheDark
  • 67
  • 1
  • 11
  • Please copy an example of your data, not an image. See [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – S Rivero Jul 25 '17 at 14:33
  • My bad. Is what I've edited in sufficient? – AlwaysInTheDark Jul 25 '17 at 14:48

1 Answers1

0

I've managed to hash together a solution, but it looks very messy. I'm convinced that there is a far more elegant way of solving this.

z = as.data.frame.matrix(test)
attach(z) 

x = as.numeric(as.character(Failed))
x = cumsum(x) #Variable names recycled

A corrected cumulative failure sum puts data into sets of number of preceding failures

z <- within(z, acc_sum <- x)
attach(z)
z$acc_sum <- as.numeric(as.character(z$acc_sum))-as.numeric(as.character(z$Failed)) 
attach(z)

z = data.frame(z, Group_Index=ave(acc_sum==acc_sum,acc_sum,FUN=cumsum)

An extra row is created that has the number of days passed since the start of the measurement. It's easier to read the code to keep new variable names than to keep indexing directly.

attach(z) 
x = (max(acc_sum)+1) #This is the number of sets of variable results

Current columns read: Date|Variable.1|Variable.2|Failed|acc_sum|Group_Index

library(ggplot2)

n = data.frame(acc_sum, Group_Index)    

This initialises the frame and should make it faster so Group_Index and acc_sum aren't read-in each time.

for(j in 1:(ncol(z)-4)){    #This iterates through all the variables to generate a new set of lists. -4 is from removing date, failed, Group_index and acc_sum
n$Variable <- z[,(j+1)] #This reads in the new variable data, but requires the variables to all be next to each other    
n[] <- lapply(n,function(x)as.numeric(as.character(x))) #This ensures all the values are numeric for plotting

plot <- ggplot(n, aes(x = Group_Index, y = Variable, colour = acc_sum)) +
    theme_bw() +
    geom_line(aes(group=acc_sum))   #linetype = "dotted"
print(plot) #This ensures that the graph is presented in every iteration

cat ("Press [enter] to continue")   #This waits for a user input before moving to the next variable
    line <- readline()
}

One of the outputs is shown here The graph could be improved for the actual variable name to change with what is being plotted. This could be done by including a ylabel in the for loop.

AlwaysInTheDark
  • 67
  • 1
  • 11