1

My dataframe just has columns that I want to create into violin plots but without specifying a y value.

Each column is a different datasubet showing an average rate of evolution (so really the y column should be created automatically). Almost all examples using ggplot use the car dataset where you specify the x column and y column that already exists in the dataframe.

Example of my dataframe:

Species  Zone1   Zone2   Zone3   Zone4
cf       0.0045  0.040   0.054    0.089
cx       0.12    0.145   0.098    0.095
cy       0.044   0.067   0.051    0.077

I want to make violin plots where the x axis has Zone1, Zone2, Zone3, and Zone4 and the y-axis are just the evolutionary rate values.

I am able to do this using the vioplot package but I want to keep my script using tidyverse and ggplot since I like it's added features more. But I cannot figure out how to transform my data to get what I need it show.

I've tried:

ggplot(my_data, aes(x=c(Zone1, Zone2, Zone3, Zone4), 
        y=c(Zone1, Zone2, Zone3, Zone4)) + geom_violin()

But this has too many arguments...Not sure what to do for the y variable.

CuriousDude
  • 1,087
  • 1
  • 8
  • 21
  • To use `ggplot` effectively, you need to convert your data into a long format so you can specify both the x and y mappings to columns. [Here's a FAQ on converting data from wide to long](https://stackoverflow.com/q/2185252/903061). – Gregor Thomas Jul 16 '19 at 14:30

1 Answers1

3

You can convert your data from wide to long format (tidyr::gather()) to use with ggplot2

library(tidyverse)

df <- read.table(text = "Species  Zone1   Zone2   Zone3   Zone4
cf       0.0045  0.040   0.054    0.089
cx       0.12    0.145   0.098    0.095
cy       0.044   0.067   0.051    0.077",
                 header = TRUE, stringsAsFactors = FALSE)

df_long <- df %>% 
  gather(key = "Zone", value = "Rate", -Species)
df_long
#>    Species  Zone   Rate
#> 1       cf Zone1 0.0045
#> 2       cx Zone1 0.1200
#> 3       cy Zone1 0.0440
#> 4       cf Zone2 0.0400
#> 5       cx Zone2 0.1450
#> 6       cy Zone2 0.0670
#> 7       cf Zone3 0.0540
#> 8       cx Zone3 0.0980
#> 9       cy Zone3 0.0510
#> 10      cf Zone4 0.0890
#> 11      cx Zone4 0.0950
#> 12      cy Zone4 0.0770

ggplot(df_long, aes(x = Zone, y = Rate)) +
  geom_violin(trim = FALSE) 

ggplot(df_long, aes(x = Zone, y = Rate)) +
  geom_violin(trim = TRUE) 

Created on 2019-07-16 by the reprex package (v0.3.0)

Tung
  • 26,371
  • 7
  • 91
  • 115
  • 1
    Thank you! this did exactly what I needed. Did not know I had to reshape it so I had missed other questions specific to reshaping data to long form. – CuriousDude Jul 16 '19 at 20:00
  • I know I'm coming back weeks after but one question, how do you just specify specific Zones to show. For example if I just want to map Zone1 and Zone2, I tried aes(x=c(Zone$Zone1, Zone$Zone2)... but that obv did not work. And calling df_long$Zone doesn't allow me to pick the zones either. Could you help? Thanks! – CuriousDude Jul 30 '19 at 13:49
  • @DNAngel: You can use `dplyr::filter()`. See these examples https://stackoverflow.com/a/55523581/786542 & https://stackoverflow.com/a/50495201/786542 – Tung Jul 30 '19 at 16:28