0

I'm new to R and ggplot2. I have a data frame where I would like to plot a histogram over one of the variables together with a subset of the same variable. Basically, what I want to do is the following

ggplot(df, aes(x = w, fill = area)) +
geom_histogram(binwidth = 1, position="dodge") 

where area would be the all the data points in my df vs all points with area > 0. I cannot find the correct way to format my data frame to make this happen. At the moment this only gives the distributions area > 0 vs area = 0.

Thanks.

EDIT: How it works now

w = runif(50,min=1,max=5)
area = c(rep(0,25), runif(25))
df = data.frame(w, area)

### Wrong
for (i in 1:50){
  if (df$area[i] > 0) {  
    df$size[i] <- "big" 
  }else {
    df$size[i] <- "small"
  }
}
ggplot(df, aes(x = w, fill = size)) +
geom_histogram(binwidth = 1, position="dodge")

How can I partition the data frame in a way that lets me plot the distribution of all data points vs the big ones?

johnblund
  • 402
  • 5
  • 21
  • Can you make your example [reproducible?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It makes it easier for others to help you. – Heroka Jan 18 '16 at 10:09
  • 1
    create a new dataset with only the big one and overlay two geom_histogram(data=all, ....) + geom_histogram(data=bigone....) – MLavoie Jan 18 '16 at 10:41

1 Answers1

1

One way is to duplicate your subset and create a new factor column that identifies your "all" rows and your "subset" rows. Then plot using the new label as the fill.

# Duplicate the "big" data points and add to the end of the data frame
dfSub <- rbind(df, df[26:nrow(df),])
# Create factor column 
dfSub$group <- as.factor(c(rep("all",50),rep("subset",25)))

ggplot(dfSub, aes(x = w, fill = group)) +
  geom_histogram(binwidth = 1, position="dodge")

Plot

Branden Murray
  • 484
  • 2
  • 10