-1

I am trying to plot two graphics in one using ggplot2. I have two data frames that share a common variable (factor). They look like this:

tb1:

study       caribbean  south.america
alison_2010 1          0
james_1998  0          1
...

tb2:

study       stage.I stage.II stage III ...
alison_2010 95.6    93.1     81.3
james_1998  94.2    80.7     74.5
...

I would like to plot one graph with both information (results as shown in tb2 and the region of origin as shown in tb1). Tb1 would be ploted as a bar plot (to create retangles on the backgound), and tb2 as a dot plot.

I tried this:

tb1<- melt(tb1, id.vars="study")
tb2<- melt(tb2, id.vars="study")

c<-ggplot()+
    geom_bar(data=tb1, aes(y=tb1$value, x=tb1$study), fill=tb1$variable),
         stat="identity", position_fill(reverse = TRUE))+
    geom_dotplot(data=tb2, aes(x=tb2$study, y=tb2$value, color=tb2$variable, fill=tb2$variable),
         binaxis="y", stackdir="center", binwidth=1, dotsize=1.5, group=1)

I get this:

enter image description here

When I add + scale_y_continuous(labels=scales::percent)

I get this:

enter image description here

I tried to use 100 insted of 1 on tb1, or to divide the values in tb2 by 100. Didn't work ;/

I am not worried about the labels or anything at the moment. I just want the bar chart to plot percentage, not counts. Can anyone help me? Thank you!

camille
  • 16,432
  • 18
  • 38
  • 60
lemosl
  • 5
  • 4
  • 1
    Welcome to SO, lemosl! Please share data in a [reproducible](https://stackoverflow.com/q/5963269/1422451) format, as described in the [`r`](https://stackoverflow.com/tags/r/info) tag description. You can use `dput()`, `reprex::reprex()` or built-in data sets for reproducible data. This was very good for a first question otherwise though. – Hack-R Jul 17 '18 at 21:56

1 Answers1

0

It seems unusual that you want box plots behind your points, but that your boxes are all zero-or-one. I have taken two approaches to answering your question.

First setting up the data:

 tb1 = data.frame(study = c("alison_2010", "james_1998"),
                  caribbean = c(1,0),
                  south.america = c(0,1))
 tb2 = data.frame(study = c("alison_2010", "james_1998"),
                  stage1 = c(95,94),
                  stage2 = c(93,80),
                  stage3 = c(81,74))

tb1a = melt(tb1, id.vars = "study")
tb2a = melt(tb2, id.vars = "study")
tba = inner_join(tb1a, tb2a, by = "study") %>% filter(value.x == 1)

Approach 1: assuming you only want to plot Alison in Caribbean and James in South America, bar plots behind:

ggplot(data = tba) +
       geom_col(aes(x=interaction(study,variable.y),  y=value.x, fill=variable.x)) +
       geom_point(aes(x=interaction(study,variable.y), y=value.y/100, color = variable.y), size = 5) +
       scale_colour_manual(values = c("purple","orange","grey"))

Result from approach 1

Notes:

  • If you also wanted plots when Caribbean = 0 then you will need to remove the filter.

  • The bar plot uses 'fill' and the points use 'color', otherwise it is difficult to get separate colors on them both.

Approach 2: Anticipating that faceting would be a simpler solution to your question:

ggplot(data=tba) +
       geom_point(aes(x=study, y=value.y, color = variable.y), size = 5) +
       facet_grid(.~variable.x)

Result from approach 2

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thank you very much! I need this kind of visualization to help me compare results by regions (and other variables). – lemosl Jul 17 '18 at 22:44
  • faceting is just what I need! Do you know how to remove observations with NA from each part (eg. remove james_1998 from the caribbean facet and alison_2010 from the south america facet? – lemosl Jul 17 '18 at 23:07
  • I got it! used facet_grid(.~variable.x, scales = "free", space = "free") – lemosl Jul 17 '18 at 23:24