2

I'm experimenting with data mining with the rOpenSci network. I'm using the rFisheries package to compare landing data between two species of fish.

I have the species data in two data frames:

mako.landings <- structure(list(year = 1950:1959, mako_catch = c(187255L, 220140L, 
232274L, 229993L, 194596L, 222927L, 303772L, 654384L, 1110352L, 
2213202L)), .Names = c("year", "mako_catch"), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

and

cod.landings <- structure(list(year = 1950:1959, cod_catch = c(77878, 96995, 
198061, 225742, 237730, 230289, 245971, 300765, 311501, 409395
)), .Names = c("year", "cod_catch"), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

These data frames both have 65 rows with the year ending at 2014. I'm trying to produce a lineplot with the year on the x-axis, the catch on the y axis, and have two series, one for each species.

I've had multiple attempts using ggplot, including joining the data frames, but they all produce figures where the cod data becomes very depressed, almost looks like a flatline compared to the mako data. Something is wrong because when I plot the data from cod.landings on its own, it looks very different.

library(dplyr)
library(ggplot2)

combined.landings <- inner_join(mako.landings, cod.landings, by = "year")

#plotting data from joined tables
ggplot() + 
    geom_line(data = combined.landings, aes(x = year, y = mako_catch), colour = "dodgerblue") + 
    geom_line(data = combined.landings, aes(x = year, y = cod_catch), colour = "red")

enter image description here

#plotting the cod and mako data separately 
p <- ggplot(mako.landings, aes(year, mako_catch)) + 
         geom_line(colour = "dodgerblue") + labs(y = "Catch (tonnes)") + labs(x = "Year")
p

enter image description here

p <- p + geom_line(data = cod.landings, aes(year, cod_catch), colour = 
"red")
p

enter image description here

#a different attempt at plotting cod and mako data separately 
ggplot() + 
    geom_line(data = mako.landings, aes(year, mako_catch, color = mako_catch)) + 
    geom_line(data = cod.landings, aes(year, cod_catch, color = cod_catch))

enter image description here

Is there something I've done wrong in the above code? Or a different method to produce the desired graph? I found a similar question elsewhere, but the solution involved writing the data into a new dataframe, which I would prefer to avoid as there are 65 observations of each species that would need to be rewritten.

Thank you

www
  • 38,575
  • 12
  • 48
  • 84
  • I have helped you by updating the code to generate your example data frames, the code to load packages, and the resulting plots. Basically, I think your code works as expected. It is a little bit unclear to me what you are looking for. Have you noticed that the numbers in the two data frames are not in the same magnitude? – www Nov 11 '17 at 20:13

2 Answers2

1

When you post your data it is better if you provide your data in a format easily read by the people wanting to help, please have a look at reproducible example. So I have not tried to use your data, and will only give you untested code.

In ggplot, it is best if your data are in the long format, here I am providing an example where lines of years are repeated data, and there is a column giving the species attribute.

mako.landings$species <- "mako"
cod.landings$species <- "cod"
combined_landings <- rbind(mako.landings,cod.landings)
#plotting data from joined tables
ggplot() + geom_line(data = combined.landings, aes(x = year, y = 
                    mako_catch, colour = "species"))

Now this will not solve your problem, the landings do not have the same scale.

you could achieve to get two different plots with different scales like this :

ggplot() + geom_line(data = combined.landings, aes(x = year, y = 
                mako_catch))+facet_wrap(~species,scales="free")
Cedric
  • 2,412
  • 17
  • 31
0

After some thoughts, I think what Cedric proposed is probably what you are looking for, a facet plot with free y-axis. By doing this you can see the trend of different species. PoGibas also provided a good way to organize your data frame, which is an approach more common than what you are currently organizing your data frame.

Here I will provided my approach. I want to show you how to convert your wide-format data frame, combined.landings, to a long-format and use for plotting.

# Load packages
library(dplyr)
library(tidyr)
library(ggplot2)

# Join two data frames
combined.landings <- inner_join(mako.landings, cod.landings, by = "year")

combined.landings2 <- combined.landings %>% 
  # Convert the data frame from wide format to long format
  gather(Species, Catch, -year) %>%
  # Further clean the species column by removing "_catch"
  mutate(Species = sub("_catch", "", Species))

Take a look at combined.landings2. This is the format that suitable for plotting in ggplot2.

combined.landings2
# # A tibble: 20 x 3
#     year Species   Catch
#     <int>   <chr>   <dbl>
#  1  1950    mako  187255
#  2  1951    mako  220140
#  3  1952    mako  232274
#  4  1953    mako  229993
#  5  1954    mako  194596
#  6  1955    mako  222927
#  7  1956    mako  303772
#  8  1957    mako  654384
#  9  1958    mako 1110352
# 10  1959    mako 2213202
# 11  1950     cod   77878
# 12  1951     cod   96995
# 13  1952     cod  198061
# 14  1953     cod  225742
# 15  1954     cod  237730
# 16  1955     cod  230289
# 17  1956     cod  245971
# 18  1957     cod  300765
# 19  1958     cod  311501
# 20  1959     cod  409395

Now we can plot the facet plot. ~Species means we want the faceting is based on the Species column. scales = "free_y" is necessary otherwise the y-axis will be fixed.

ggplot(combined.landings2, aes(x = year, y = Catch, color = Species)) +
  geom_line(size = 2) +
  facet_wrap(~Species, scales = "free_y")

enter image description here

www
  • 38,575
  • 12
  • 48
  • 84
  • 1
    Yes is pretty much exactly what I had in mind. Thank you that's amazing! I saw your initial comment about the respective magnitudes... I did notice it. When I had plotted the entire data set (1950-2014), it just did not look as though the values in the series matched with the values in the data frame. I could have maybe gotten that point across if I had properly posted this as a reproducible question (which I now see the importance of). But these figures should work, they're great. Thank you again for everyone's help! – LHeartstone Nov 11 '17 at 21:24