0

I am plotting a quantile-quantile plot for a certain data that I have. I would like to print only certain panels that satisfy a condition that I put in for panel.qq(x,y,...).

Let me give you an example. The following is my code,

qq(y ~ x|cond,data=test.df,panel=function(x,y,subscripts,...){
    if(length(unique(test.df[subscripts,2])) > 3 ){panel.qq(x,y,subscripts,...})})

Here y is the factor and x is the variable that will be plotted on X and y axis. Cond is the conditioning variable. What I would like is, only those panels be printed that pass the condition in the panel function, which is

if(length(unique(test.df[subscripts,2])) > 3). 

I hope this information helps. Thanks in advance.

Added Sample data,

        y      x cond
1       1      6 125
2       2      5 125
3       1      5 125
4       2      6 125
5       1      3 125
6       2      8 125
7       1      8 125
8       2      3 125
9       1      5 125
10      2      6 125
11      1      5 124
12      2      6 124
13      1      6 124
14      2      5 124
15      1      5 124
16      2      6 124
17      1      4 124
18      2      7 124
19      1      0 123
20      2     11 123
21      1      0 123
22      2     11 123
23      1      0 123
24      2     11 123
25      1      0 123
26      2     11 123
27      1      0 123
28      2      2 123 

So this is the sample data. What I would like is to not have a panel for 123 as the number of unique values for 123 is 3, while for others its 4. Thanks again.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Bartha
  • 115
  • 1
  • 1
  • 7
  • 3
    Why don't you just subset the data and pass it to the plotting function? – PavoDive Apr 20 '15 at 13:31
  • @PavoDive, That is something that I would love to try, but right now the behaviour of the values of the data is not completely visible, yet. And if I were to understand this behaviour I could easily subset the data. As of now all I understand is that the data is of two types, one that does not have multiple values and another that does. This is not the intrinsic behaviour of the values of my data but the data itself. And because I cannot differentiate at this point in time I am not subseting the data. – Bartha Apr 20 '15 at 13:37
  • Bartha, I'm not yet sure why @PavoDive's suggestion isn't a good option. I haven't been working with panel functions lately, and I'm not sure what `subscripts` refers to, but it's possible to subset a dataframe using arbitrarily complex conditions. So you can use something roughly like this: `data=test.df[length(unique(test.df[subscripts,2])) > 3,]`. (Note the final comma.) You would need to replace `subscripts` with something else. I believe that subscripts refers to row indexes, and I'm not sure how to replace it. So this is not an answer, but it should point toward an answer. – Mars Apr 20 '15 at 14:00
  • 2
    Once you've made it to the `panel=` function, a panel will be drawn. You can not filter panels at that point. You need to subset you data as already described before going to the plotting function. This shouldn't be that hard. It would be helpful if you provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can test possible solutions. – MrFlick Apr 20 '15 at 16:41
  • @MrFlick I did realize that a panel call would be made for every subset the is passed based on the conditioning. I was hoping there would be a way I could prevent the panel from being printed. I have added a sample, hope this helps. dput would probably be a failure as the data is too huge. – Bartha Apr 21 '15 at 22:08

2 Answers2

3

Yeah, I think it is a subset problem, not a lattice one. You don't include an example, but it looks like you want to keep only rows where there are more than 3 rows for each value of whatever is in column 2 of your data frame. If so, here is a data.table solution.

library(data.table)
test.dt <- as.data.table(test.df)
test.dt.subset <- test.dt[,N:=.N,by=c2][N>3]

Where c2 is that variable in the second column. The last line of code first adds a variable, N, for the count of rows (.N) for each value of c2, then subsets for N>3.

UPDATE: And since a data table is also a data frame, you can use test.dt.subset directly as the data source in the call to qq (or other lattice function).

UPDATE 2: Here is one way to do the same thing without data.table:

d <- data.frame(x=1:15,y=1:15%%2,  # example data frame
       c2=c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))

d$N <- 1 # create a column for count
split(d$N,d$c2) <- lapply(split(d$x,d$c2),length) # populate with count
d
d[d$N>3,] # subset
DaveTurek
  • 1,297
  • 7
  • 8
0

I did something very similar to DaveTurek. My sample dataframe above is test.df

test.df.list <- split(test.df,test.df$cond,drop=F)
final.test.df <- do.call("rbind",lapply(test.df.list,function(r){
if(length(unique(r$x)) > 3){r}})

So, here I am breaking the test.df as a list of data.frames by the conditioning variable. Next, in the lapply I am checking the number of unique values in each of subset dataframe. If this number is greater than 3 then the dataframe is given /taken back if not it is ignored. Next, a do.call to bind all the dfs back to one big df to run the quantile quantile plot on it. In case anyone wants to know the qq function call after getting the specific data. then it is,

trellis.device(postscript,file="test.ps",color=F,horizontal=T,paper='legal')
qq(y ~ x|cond,data=final.test.df,layout=c(1,1),pch=".",cex=3)
dev.off()

Hope this helps.

Bartha
  • 115
  • 1
  • 1
  • 7