1

I want to recreate the following plot using ggplot2. My data file is like this

Cospeciation Duplications Host Switches Losses
3 4 6 1
3 3 7 0
3 3 7 0
5 2 6 3
5 2 6 3
5 2 6 3
5 2 6 3
5 2 6 3

The problem I have is how to plot it. As an example when I try to do it using ggplot(GGplot_Test, aes(Event, Duplications)) + geom_boxplot()it use duplications as Y-axis. What I want is values given in each column to appear as Y-axis while Event, Duplications, Host Switches, and Losses to appear as a different group in X-axis as below. Can someone help me in this regard? Thanks in advance.

Box & Whisker Plot Shi et al

GenomeBio
  • 69
  • 8
  • 2
    Welcome to SO! Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557 (and https://xkcd.com/2116/). Please just include the code, console output, or data (e.g., `dput(head(x))` or `data.frame(...)`) directly. – r2evans Aug 09 '20 at 03:43
  • My apologies. Updated the post. – GenomeBio Aug 09 '20 at 04:13
  • @GenomeBio: this might be useful too https://stackoverflow.com/questions/49003863/how-to-plot-a-hybrid-boxplot-half-boxplot-with-jitter-points-on-the-other-half – Tung Aug 09 '20 at 04:43

1 Answers1

1

If you write out the argument names you're putting into ggplot, you'll see why your code is wrong. ggplot(data = GGplot_Test, mapping = aes(x = Event, y = Duplications)) + geom_boxplot()

To use ggplot you'll first need to convert your data into tidy long format. You're going to want to use tidyr::pivot_longer to get a grouping column. Also, it seems your data is only for one species e.g. arenavirdae.

So, first, use pivot_longer() to get data that looks like this

name value

Cospeciation 3

Cospeciation 3

Cospeciation 3

Cospeciation 5

...

Duplications 4

Duplications 3

...

Then you can use ggplot

ggplot(data = GGplot_Test, mapping = aes(x = name, y = value)) + geom_boxplot()

and if you can combine your data so that it looks like

species name value

Arena Cospeciation 3

Arena Cospeciation 3

Arena Cospeciation 3

Arena Cospeciation 5

...

Arena Duplications 4

Arena Duplications 3

...

Ateri Cospeciation 6

Ateri Cospeciation 5

Ateri Cospeciation 4

Ateri Cospeciation 5

...

Ateri Duplications 6

Ateri Duplications 5

...

then you can use facets in ggplot to get all the graphs ggplot(data = GGplot_Test, mapping = aes(x = name, y = value)) + geom_boxplot() + facet_wrap(cols = vars(species))

Finally, if you paste in your data (copy and paste the results of dput(head(Ggplot_Test)) as @r2evans suggested), then we could help much more easily.

Arthur Yip
  • 5,810
  • 2
  • 31
  • 50