2

My data looks like this example:

    dataExample<-data.frame(Time=seq(1:10),
        Data1=runif(10,5.3,7.5),
        Data2=runif(10,4.3,6.5),
        Application=c("Substance1","Substance1","Substance1",
        "Substance1","Substance2","Substance2","Substance2",
        "Substance2","Substance1","Substance1"))
        dataExample

           Time    Data1    Data2 Application
        1     1 6.511573 5.385265  Substance1
        2     2 5.870173 4.512775  Substance1
        3     3 6.822132 5.109790  Substance1
        4     4 5.940528 6.281412  Substance1
        5     5 7.269394 4.680380  Substance2
        6     6 6.122454 6.015899  Substance2
        7     7 5.660429 6.113362  Substance2
        8     8 6.649749 4.344978  Substance2
        9     9 7.252656 4.764667  Substance1
        10   10 7.204440 5.835590  Substance1

I would like to indicate at which time any Substance was applied that is different from dataExample$Application[1].

Here I show you the way I get this ploted, but I assume that there is a much easier way to do it with ggplot.

library(reshape2)
library(ggplot)

plotDataExample<-function(DataFrame){
  longDF<-melt(DataFrame,id.vars=c("Time","Application"))
  p=ggplot(longDF,aes(Time,value,color=variable))+geom_line()

  maxValue=max(longDF$value)
  minValue=min(longDF$value)

  yAppLine=maxValue+((maxValue-minValue)/20)
  xAppLine1=min(longDF$Time[which(longDF$Application!=longDF$Application[1])])
  xAppLine2=max(longDF$Time[which(longDF$Application!=longDF$Application[1])])
  lineData=data.frame(x=c(xAppLine1,xAppLine2),y=c(yAppLine,yAppLine))

  xAppText=xAppLine1+(xAppLine2-xAppLine1)/2
  yAppText=yAppLine+((maxValue-minValue)/20)
  appText=longDF$Application[which(longDF$Application!=longDF$Application[1])[1]]
  textData=data.frame(x=xAppText,y=yAppText,appText=appText)

  p=p+geom_line(data=lineData,aes(x=x, y=y),color="black")
  p=p+geom_text(data=textData,aes(x=x,y=y,label = appText),color="black")
  return(p)
}
plotDataExample(dataExample)

enter image description here

Question: Do you know a better way to get a similar result so that I could possibly indicate more than one factor (e.g. Substance3, Substance4 ...).

new2R
  • 235
  • 1
  • 2
  • 6
  • 3
    Surely you mean Substance3, Substance4...? Care to include more substances in your example? – Roman Luštrik Apr 17 '13 at 11:26
  • In your data is there possibility that Substance1 and Substance2 changes more than once? – Didzis Elferts Apr 17 '13 at 12:16
  • `dataExample<-data.frame(Time=seq(1:10), Data1=runif(10,5.3,7.5), Data2=runif(10,4.3,6.5), Application=c("Substance1","Substance2","Substance2", "Substance1","Substance1","Substance2","Substance2", "Substance2","Substance1","Substance1"))` – new2R Apr 17 '13 at 12:18
  • yes, it should change more than ones but my examle does not work this way. I would like to see more than one line if it changes more than ones. – new2R Apr 17 '13 at 12:23
  • @Roman Luštrik to add more substances:`dataExample<-data.frame(Time=seq(1:21), Data1=runif(21,5.3,7.5), Data2=runif(21,4.3,6.5), Application=c("Substance1","Substance1","Substance1", "Substance2","Substance2","Substance2", "Substance1","Substance1","Substance1", "Substance3","Substance3","Substance3", "Substance1","Substance1","Substance1", "Substance4","Substance4","Substance4", "Substance1","Substance1","Substance1")) dataExample` – new2R Apr 17 '13 at 12:42

1 Answers1

1

First, made new sample data to have more than 2 levels and twice repeated Substance2.

dataExample<-data.frame(Time=seq(1:10),
                        Data1=runif(10,5.3,7.5),
                        Data2=runif(10,4.3,6.5),
                        Application=c("Substance1","Substance1","Substance2",
                                      "Substance2","Substance1","Substance1","Substance2",
                                      "Substance2","Substance3","Substance3"))

Didn't make this as function to show each step.

Add new column groups to original data frame - this contains identifier for grouping of Applications - if substance changes then new group is formed.

dataExample$groups<-c(cumsum(c(1,tail(dataExample$Application,n=-1)!=head(dataExample$Application,n=-1))))

Convert to long format data for lines of data.

longDF<-melt(dataExample,id.vars=c("Time","Application","groups"))

Calculate positions for Substance identifiers. Used function ddply() from library plyr. For calculation only data that differs from first Application value are used (that's subset()). Then Application and groups are used for grouping of data. Calculated starting, middle and ending positions on x axis and y value taken as maximal value +0.3.

library(plyr)    
lineData<-ddply(subset(dataExample,Application != dataExample$Application[1]),
      .(Application,groups),
                summarise,minT=min(Time),maxT=max(Time),
                meanT=mean(Time),ypos=max(longDF$value)+0.3)

Now plot longDF data with ggplot() and geom_line() and add segments above plot with geom_segment() and text with annotate() using new data frame lineData.

ggplot(longDF,aes(Time,value,color=variable))+geom_line()+
  geom_segment(data=lineData,aes(x=minT,xend=maxT,y=ypos,yend=ypos),inherit.aes=FALSE)+
  annotate("text",x=lineData$meanT,y=lineData$ypos+0.1,label=lineData$Application)

enter image description here

Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • Thank you, this code is exactly what I was looking for. But when I make it as a function R says, `Error in eval(expr, envir, enclos) : object 'longDF' not found`. Do you know what I am doing wrong? – new2R Apr 17 '13 at 15:25
  • See this [question on SO](http://stackoverflow.com/questions/10659133/local-variables-within-aes/10662937#10662937) – Didzis Elferts Apr 17 '13 at 15:28