0

I am attempting to make a violin plot using the percentage cover of different animals over a rocky shore. I've never used a violin graph before, so understanding how the plots are supposed to look like or work is proving difficult.

enter image description here

enter image description here

Obviously, something is wrong. I think that the simplest issue is my Excel formatting, but I have no idea how to fix that or where to even start.

My current code:

kitep <- ggplot(data = kite, aes(x = meters, y = percentage)) + geom_violin()

I am aiming to get 3 violin plots, one on top of each other, for each of the four species showcasing their percentage cover over the meters.

Phil
  • 7,287
  • 3
  • 36
  • 66
eiion
  • 11
  • 4
  • 1
    Can you please edit your question to include a sample of data using `dput()`? It makes it harder for others to help if you share data as images. Can you also please edit your question to include the code you have tried? It's hard to know how to help you fix your issue if you don't share the code. – nrennie Jul 10 '23 at 14:11
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064). Violin plots are used to estimate the density function, but you seem to have summarized function which doesn't make it possible to use normal density estimation techniques. – MrFlick Jul 10 '23 at 14:35

2 Answers2

1

You need to ensure that your data is imported in the correct format first. R cannot have nested headings in data frames the way your Excel data does. The following data frame reproduces your Excel data in an R friendly format:

df <- data.frame(meters = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 30), `castle barnacles` = c(0, 0, 0, 0, 0, 
3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0), `diamond barnacles` = c(6, 
14, 28, 53, 39, 44, 56, 0, 0, 0, 0, 19, 22, 11, 42, 0), `toothed wrack` = c(94, 
25, 53, 14, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `bladder wrack` = c(0, 
28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0), check.names = FALSE)

Now the object df looks like this:

df
#>    meters castle barnacles diamond barnacles toothed wrack bladder wrack
#> 1       0                0                 6            94             0
#> 2       2                0                14            25            28
#> 3       4                0                28            53             6
#> 4       6                0                53            14            22
#> 5       8                0                39             6            19
#> 6      10                3                44             0            42
#> 7      12                0                56             0            19
#> 8      14               39                 0             0            36
#> 9      16               25                 0             0            56
#> 10     18               39                 0             0            39
#> 11     20               50                 0             0            28
#> 12     22               19                19             0            17
#> 13     24               36                22             0            31
#> 14     26               25                11             0            14
#> 15     28               31                42             0             3
#> 16     30                0                 0             0             0

To plot it using ggplot, it would be best to pivot into long format using pivot_longer from the tidyr package, and then "uncount" it using uncount, also from the tidyr package. Both the ggplot2 and tidyr package are loaded when you do library(tidyverse)

df %>%
  pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
  uncount(Count) %>%
  ggplot(aes(x = meters, y = Species, color = Species)) +
  geom_violin(aes(fill = after_scale(alpha(color, 0.6))),
              width = 1.8, position = 'identity', trim = FALSE) +
  scale_color_brewer(palette = 'Set1', guide = 'none') +
  theme_minimal(base_size = 16) +
  labs(x = 'Meters from shore', title = 'Species distribution', y = NULL) +
  coord_cartesian(xlim = c(0, 30), expand = FALSE) +
  theme(plot.title.position = 'plot')

enter image description here

Since your numbers are percentages and you wish to show relative abundance at each distance from the shore, an alternative approach might be a smoothed area plot:

df %>%
  summarize(across(-meters, ~ spline(meters, .x, n = 1000)$y)) %>%
  mutate(meters = seq(0, 30, length = 1000)) %>%
  pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
  mutate(Count = ifelse(Count < 0, 0, Count)) %>%
  ggplot(aes(x = meters, y = Count, colour = Species)) +
  geom_area(aes(fill = after_scale(alpha(colour, 0.5))), position = 'fill') +
  scale_colour_brewer(palette = 'Set1') +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal(base_size = 16) +
  labs(x = 'Meters from shore', title = 'Species distribution', y = NULL) +
  coord_cartesian(xlim = c(0, 30), expand = FALSE) +
  theme(plot.title.position = 'plot',
        legend.position = 'bottom')

enter image description here

Created on 2023-07-10 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
0

I think the issue is that you have in row 1: "meters" and "percentage", then in row 2 you start with the numeric of the meters but you have some character values in percentage. My advice would be to remove the "percentage" cell so everything is in the same level. Once you do that you would need to transform the data to have only three columns "meters", "variable" (castle, diamonds...), and "percentage". Then you should be able to do it. Here is how I would do it.

df <- data.frame(meters = seq(0,30,2),
             castle = c(rep(0, 5), 3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0),
             diamond = c(6, 14, 28, 53, 39, 44, 56, rep(0,4), 19, 22, 11, 42, 0),
             tooth = c(94, 25, 53, 14, 6, rep(0, 11)),
             bladder = c(0, 28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0))

Convert the data frame to long format

df_long <- tidyr::gather(df, key = "variable", value = "value", -meters)

Create the violin plot

ggplot(df_long, aes(x = meters, y = value, fill = variable)) +
  geom_violin(scale = "width", trim = FALSE) +
  labs(x = "Meters", y = "Percentage", fill = "Variable") +
  scale_fill_manual(values = c("castle" = "red", "diamond" = "blue", "tooth" = "green", "bladder" = "purple")) +
  theme_minimal()

enter image description here

I hope this helps!

Pexav01
  • 45
  • 5