0

I'm looking for a way to remove rows in a data frame with less than 3 observations. Let me explain the matter in a better way. I have a dataframe with 6 indipendent variables and 1 dependent. As I'm doing a density plot in ggplot2 using faceting, variables with less than 3 observations are not plotted (obviously). I'm looking for a way to delete these rows with less than 3 observations. this is an example of the data:

'data.frame':   432 obs. of  6 variables:
$ ID        : Factor w/ 439 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Forno     : Factor w/ 8 levels "Micro","Macro",..: 1 1 1 6 6 6 4 4 4 5 ...
$ Varieta: Factor w/ 11 levels "cc","dd",..: 11 11 11 6 6 6 1 1 1 6 ...
$ Impiego: Factor w/ 5 levels "aperto","chiuso",..: 2 2 2 3 3 3 2 2 2 5 ...
$ MediaL    : num  60.7 58.9 60.5 55.9 56.1 ...
$ MediaL.sd : num  4.81 4.79 4.84 5.27 5.64 ...

ggplot code:

ggplot(d1,aes(MediaL))+geom_density(aes(fill=Varieta),colour=NA,alpha=0.5)+
    scale_fill_brewer(palette="Set1")+facet_grid(Forno~Impiego)+
    theme(axis.text.x=element_text(angle=90,hjust=1))+theme_mio +xlim(45,65)+
    stat_bin(geom="text",aes(y=0,label=..count..),size=2,binwidth=2)

I would like to remove all the interactions with less than 3 observations.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
Spigonico
  • 137
  • 1
  • 10

1 Answers1

1

Providing the actual output of your sample data would be useful. You can provide this via dput(yourObject) instead of the text representation you provided. However, it does seem like the same basic approach below works equally well with a matrix, data.frame, and table data structure.

#Matrix
x <- matrix(c(5,4,4,3,1,5,1,8,2), ncol = 3, byrow = TRUE)
x[x < 3] <- NA
#----
     [,1] [,2] [,3]
[1,]    5    4    4
[2,]    3   NA    5
[3,]   NA    8   NA

#data.frame
xd <- as.data.frame(matrix(c(5,4,4,3,1,5,1,8,2), ncol = 3, byrow = TRUE))
xd[xd < 3] <- NA
#----
  V1 V2 V3
1  5  4  4
2  3 NA  5
3 NA  8 NA

#Table. Simulate some data first
set.seed(1)
samp <- data.frame(x1 = sample(c("acqua", "fango", "neve"), 20, TRUE),
                   x2 = sample(c("pippo", "pluto", "paperino"), 20, TRUE))
x2 <-table(samp)
x2[x2 < 3] <- NA
#----
       x2
x1      paperino pippo pluto
  acqua                    3
  fango        3            
  neve               3     3

ggplot generally likes data to be in long format, most often achieved via the melt() command in reshape2. If you provide your plotting code, that may illustrate a better way to remove the data you don't want to plot.

Chase
  • 67,710
  • 18
  • 144
  • 161
  • now you have the ggplot script. I would like to remove the rows from the dataframe. – Spigonico Nov 30 '12 at 14:13
  • You're still not completely to a reproducible example. You would need to add the data you use to make your plot. Please see this question for details: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. My guess is that you should probably be able to put 2 and 2 together with what I have above and your code, If not - come back with a specific question on where you are stuck. – Chase Nov 30 '12 at 15:49