0

I have two data sets showing the lengths of fish and would like to create side by side histogram plots to compare the data. The issue I'm having is scaling the y-axis and bin sizes so that they are comparable. Instead of counts, I wanted to use %frequency of the data. I'm also having issues with plotting them side by side when they're coming from two different sources. Can you use the facet_grid or facet_wrap to do this?

Any help would be much appreciated!

EDIT

I used this code which just gives a basic histogram with the counts..

ggplot(snook, aes(sl)) +geom_histogram(binwidth = 20, color="black", fill= "light blue")+
  ggtitle("All Snook")+
  labs(x="Standard Length(mm)", y="Counts")+
  theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
                     panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))

this is the plot I get using the ggplot code above

Below are the results from using the code offered by SimeonL below

opar <- par(mfrow = c(1,2))
hist(snook$sl, breaks = seq(0, 1000, length = 50), freq = T, main = "All Snook", xlab = "Length (mm)", ylim = c(0, 50), las = 1)
hist(gut_Snook$SL, breaks = seq(0, 1000, length = 50), freq = T, main = "Culled Snook", xlab = "Length (mm)", ylim = c(0, 50), las = 1)
par(opar)

This is close, however it looks like it's still using the counts for the y-axis rather than % frequency.

enter image description here

  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 14 '20 at 20:43

1 Answers1

1

Two options in base R:

  1. using hist and change y-axis labels to match percentage:
set.seed(23)
df1 <- data.frame(f_size = rnorm(120, 20, 15))
  x.1   <- approxfun(c(0, 100), c(0, nrow(df1)))
df2 <- data.frame(f_size = rnorm(70, 5, 5))
  x.2   <- approxfun(c(0, 100), c(0, nrow(df2)))

opar <- par(mfrow = c(1,2))
hist(df1$f_size, breaks = seq(-20, 70, length = 40), freq = T, main = "", xlab = "df1_size", 
     ylim = x.1(c(0, 25)), las = 1, yaxt = "n", ylab = "% Cases")
axis(2, at = x.1(seq(0, 25, 5)), labels = seq(0, 25, 5), las = 1)
hist(df2$f_size, breaks = seq(-20, 70, length = 40), freq = T, main = "", xlab = "df2_size", 
     ylim = x.2(c(0, 25)), las = 1, yaxt = "n", ylab = "")
axis(2, at = x.2(seq(0, 25, 5)), labels = seq(0, 25, 5), las = 1)
par(opar)
  1. Calculate percentage first and use barplot:
breaks <- seq(-20, 70, length = 40)
df1.perc <- aggregate(df1$f_size, by = list(cut(df1$f_size, breaks, labels = F)), FUN = function(x) (length(x)/nrow(df1))*100)
df2.perc <- aggregate(df2$f_size, by = list(cut(df2$f_size, breaks, labels = F)), FUN = function(x) (length(x)/nrow(df2))*100)

opar <- par(mfrow = c(1,2))
bp   <- barplot(height = merge(data.frame(Group.1 = 1:length(breaks)), df1.perc, all.x = T)$x, 
                xlab = "df1_size", ylab = "% Cases", ylim = c(0, 25), las = 1)
axis(1, at = approx(breaks, bp, xout = seq(-40, 70, by = 10))$y, labels = seq(-40, 70, by = 10))
bp   <- barplot(height = merge(data.frame(Group.1 = 1:length(breaks)), df2.perc, all.x = T)$x, 
                xlab = "df1_size", ylab = "", ylim = c(0, 25), las = 1)
axis(1, at = approx(breaks, bp, xout = seq(-40, 70, by = 10))$y, labels = seq(-40, 70, by = 10))
SimeonL
  • 165
  • 6
  • 1
    This is great, however I edited my original post to show the results with my data and it looks like the y-axis is still using counts and not %frequency ? Also, is there additional code I can use to modify the number of bins used? –  Apr 15 '20 at 16:07
  • 1
    I edited the answer. if you want to modify the bins, simply change the breaks e.g. ```reaks = seq(-20, 70, length = 20)```. This would result in 20 bins along the sequence from -20 to 70. – SimeonL Apr 16 '20 at 07:12
  • 1
    Thank you so much!! This is exactly what I needed :) –  Apr 16 '20 at 12:50