-1

I am trying to turn this piece of code into an R function

separea=quantile(foo6$area,seq(0,1,0.001),na.rm=T)
nb=length(separea)[1]-1
resultats=matrix(NA,nb,8)
for (count in 1:nb){
  print(c("area: ",separea[count] ))
  b=foo6[foo6$area >= separea[max(1,count-20)] & foo6$area <= separea[min(count+20,nb+1)],]
  q01 = quantile( b$nq , 0.01,na.rm=T)  
  q10 = quantile( b$nq , 0.10,na.rm=T)
  q25 = quantile( b$nq , 0.25,na.rm=T)
  q50 = quantile( b$nq , 0.50,na.rm=T)
  q75 = quantile( b$nq , 0.75,na.rm=T)
  q90 = quantile( b$nq , 0.90,na.rm=T)
  q99 = quantile( b$nq , 0.99,na.rm=T)  
  if(dim(b)[1]>100){
    resultats[count,]=cbind(separea[count],q01,q10,q25,q50,q75,q90,q99)
  }
}
resultats=resultats[!is.na(resultats[,1]),]
dim1=dim(resultats)[1]

And I wrote this function:

quantile.prep<-function(dframe,xvar,yvar){
  separea=quantile(dframe$xvar,seq(0,1,0.001),na.rm=T)
  nb=length(separea)[1]-1
  resultats=matrix(NA,nb,8)
  for (count in 1:nb){
    print(c("area: ",separea[count] ))
    b=dframe[dframe$xvar >= separea[max(1,count-20)] & dframe$area <= separea[min(count+20,nb+1)],]
    q01 = quantile( b$yvar , 0.01,na.rm=T)  
    q10 = quantile( b$yvar , 0.10,na.rm=T)
    q25 = quantile( b$yvar , 0.25,na.rm=T)
    q50 = quantile( b$yvar , 0.50,na.rm=T)
    q75 = quantile( b$yvar , 0.75,na.rm=T)
    q90 = quantile( b$yvar , 0.90,na.rm=T)
    q99 = quantile( b$yvar , 0.99,na.rm=T)  
    if(dim(b)[1]>100){
      resultats[count,]=cbind(separea[count],q01,q10,q25,q50,q75,q90,q99)
    }
  }
  resultats=resultats[!is.na(resultats[,1]),]
  dim1=dim(resultats)[1]
}

But I am getting this error: Error in dframe$xvar : $ operator is invalid for atomic vectors

When I call using quantile.prep(dframe='foo6',xvar='area',yvar='nq')

dput(droplevels(head(foo6)))

structure(list(area = c(162.6513, 162.6513, 162.6513, 162.6513, 
162.6513, 162.6513), nq = c(0.140843018162167, 0.152855833307204, 
0.193245919337872, 0.156860105022216, 0.171658019333384, 0.18628194179819
)), .Names = c("area", "nq"), row.names = c(NA, 6L), class = "data.frame")

Could you please help?

The proposed outputenter image description here

WORKING SOLUTION

quantile.prep<-function(dframe,xvar,yvar){
  separea=quantile(dframe[,xvar],seq(0,1,0.001),na.rm=T)
  nb=length(separea)[1]-1
  resultats=matrix(NA,nb,8)
  for (count in 1:nb){
    print(c("area: ",separea[count] ))
    b=dframe[dframe[,xvar]>= separea[max(1,count-20)] & dframe[,'xvar']<= separea[min(count+20,nb+1)],]
    q01 = quantile( b[,yvar] , 0.01,na.rm=T)  
    q10 = quantile( b[,yvar] , 0.10,na.rm=T)
    q25 = quantile( b[,yvar] , 0.25,na.rm=T)
    q50 = quantile( b[,yvar] , 0.50,na.rm=T)
    q75 = quantile( b[,yvar] , 0.75,na.rm=T)
    q90 = quantile( b[,yvar] , 0.90,na.rm=T)
    q99 = quantile( b[,yvar] , 0.99,na.rm=T)  
    if(dim(b)[1]>100){
      resultats[count,]=cbind(separea[count],q01,q10,q25,q50,q75,q90,q99)
    }
  }
  resultats=resultats[!is.na(resultats[,1]),]
  dim1=dim(resultats)[1]
}
Geekuna Matata
  • 1,349
  • 5
  • 19
  • 38
  • 3
    Don't pass your data.frame as a string: `quantile.prep(dframe=foo6,xvar='area',yvar='nq')` and you can't use "$" with strings either, use standard indexing "[,]". (I always forget the canonical duplicate for this one) – MrFlick Jun 16 '14 at 19:31
  • @MrFlick [This](http://stackoverflow.com/q/2641653/324364) is the closest I can find at the moment. – joran Jun 16 '14 at 19:35
  • Can you explain the purpose of your code in words? I think there might be a simpler way to code this, but I'm not sure what you want to do. – kdauria Jun 16 '14 at 19:35
  • @kdauria I have attached the proposed output. Please let me know if there is an easier way to do this. This is the complete code: http://pastebin.com/Ejs4A4GC – Geekuna Matata Jun 16 '14 at 19:39
  • @MrFlick Just clarifying: This is how I pass? separea=quantile(dframe[,xvar],seq(0,1,0.001),na.rm=T) I haven't use this notation before. – Geekuna Matata Jun 16 '14 at 19:42
  • @GeekunaMatata Yes. If `dframe` is the actual data.frame and `xvar` is a character vector with the name of the column in X that will work. If you've never seen that style of indexing you should really review an introduction to R. That's basic `[row,column]` indexing used for matrices and data.frames among other objects. – MrFlick Jun 16 '14 at 19:45
  • I have added the proposed solution. But I am getting this error now: Error in `[.data.frame`(dframe, , "xvar") : undefined columns selected What could be the problem? – Geekuna Matata Jun 16 '14 at 19:51
  • 1
    Once you figure out your problem, can you change the title of your post to better reflect the real issue? "Why is my function throwing an error" isn't very helpful to people that might have the same problem. – kdauria Jun 16 '14 at 20:35
  • Yes sure. Thank you. I'll redraft accordingly. – Geekuna Matata Jun 16 '14 at 21:07

1 Answers1

1

Does this do what you want?

# An example data set
area = seq( log10(10), log10(10000), length.out=5000 )
discharge = seq(1,0.08,length.out=5000) + rnorm(5000, 0, 0.2)
df = data.frame( area=area, nq=discharge )

# Quick peak at the data
plot( df )

# Bin the data into 1,000 bins
# If I read your code right, you are essentially looking at 
# a window of your data that has
# a width of of 40/1000*range(data[,"area"])
# You are looking at that window at 1,000 evenly spaced points
# along the x-axis (here saved as xout)
xout = seq( min(df[,"area"]), max(df[,"area"]), length.out=1000)
window.size = 40/1000*diff(range(df[,"area"]))
results = matrix(NA,nrow=length(xout),ncol=8) # allocate matrix that stores quantiles
for( i in seq_along(xout) ) {

  window = df[,"area"] < (xout[i] + window.size/2) &
           df[,"area"] > (xout[i] - window.size/2)
  values = df[window,"nq"]
  quantiles = quantile(values, probs=c(0.01,0.1,0.25,0.5,0.75,0.9,0.90), na.rm=TRUE )
  results[i,] = c(xout[i],quantiles)
}

# Now plot the results
library(reshape)
library(ggplot2)
colnames(results) = c("area","q01","q10","q25","q50","q75","q90","q99")
results = as.data.frame(results)
melted.results = melt(results, id.vars="area")
ggplot() + 
  geom_point(data=df, aes(x=area,y=nq), alpha=0.15) +
  geom_line(data=melted.results,aes(x=area,y=value,group=variable,color=variable),size=2)

enter image description here

kdauria
  • 6,300
  • 4
  • 34
  • 53
  • Thank you. This looks beautiful. And I wanted it in ggplot. Trying it now. – Geekuna Matata Jun 16 '14 at 20:26
  • One problem: The x-axis and y-axis of your figure is different. How can I use the log of values in the figures like mine? Thank you so much for this code. The original dataframe has no log values in mine and I take log later as shown in my code. – Geekuna Matata Jun 16 '14 at 21:06
  • Also, I would like to have the differential shading like the proposed figure. Thank you so much! – Geekuna Matata Jun 16 '14 at 21:56
  • Take a look at the examples here: http://docs.ggplot2.org/current/scale_continuous.html. You can probably use `scale_x_log10()` and `scale_y_log10()` to change the axes. To make shading I'd recommend `geom_ribbon(data=melted.results,aes(x=area,ymin=q25,ymax=q75),alpha=0.3) + geom_ribbon(data=melted.results,aes(x=area,ymin=q10,ymax=q90),alpha=0.3) geom_ribbon(data=melted.results,aes(x=area,ymin=q01,ymax=q99),alpha=0.3) + geom_line(data=melted.results,aes(x=area,y=q50))`. This is completely untested so you may have to fiddle with it. If you get it working, make sure to upload it. – kdauria Jun 17 '14 at 13:55
  • Feel free to edit my answer or add an answer that copies+adds to mine. I don't mind at all. – kdauria Jun 17 '14 at 14:02
  • Thank you. I am adding it now. And will edit and accept your answer with what works, – Geekuna Matata Jun 18 '14 at 16:21