1

If this question has already been answered, please link as I have not been able to locate a similar question. I have referred to R bar plot with 3 variables, Bar plot with multiple variables in R, ggplot with 2 y axes on each side and different scales,Bar Plot with 2 y axes and same x axis in R language [duplicate],Bar Plot with 2 Y axes and same X- axis.

I have a dataset that includes species, observed value, expected value, and a standardized value from the observed and expected.

data <- structure(list(Species = c("BABO_BW", "BABO_BW", "BABO_BW", "BABO_RC", 
"BABO_RC", "BABO_RC", "BABO_SKS", "BABO_SKS", "BABO_SKS", "BABO_MANG", 
"BABO_MANG", "BABO_MANG", "BW_RC", "BW_RC", "BW_RC", "BW_SKS", 
"BW_SKS", "BW_SKS", "BW_MANG", "BW_MANG", "BW_MANG", "RC_SKS", 
"RC_SKS", "RC_SKS", "RC_MANG", "RC_MANG", "RC_MANG", "SKS_MANG", 
"SKS_MANG", "SKS_MANG"), variable = c("obs.C-score", "exp.C-score", 
"SES_Cscore", "obs.C-score", "exp.C-score", "SES_Cscore", "obs.C-score", 
"exp.C-score", "SES_Cscore", "obs.C-score", "exp.C-score", "SES_Cscore", 
"obs.C-score", "exp.C-score", "SES_Cscore", "obs.C-score", "exp.C-score", 
"SES_Cscore", "obs.C-score", "exp.C-score", "SES_Cscore", "obs.C-score", 
"exp.C-score", "SES_Cscore", "obs.C-score", "exp.C-score", "SES_Cscore", 
"obs.C-score", "exp.C-score", "SES_Cscore"), value = c(328680, 
276507, 6.73358774036271, 408360, 345488, 5.31345024375997, 285090, 
254670, 4.35376633657727, 12474, 12190, 1.24624427424057, 1450800, 
1809738, -11.0195450589776, 1507488, 1361088, 6.15672144449049, 
62706, 65780, -0.495728742814285, 1790156, 1700165, 2.70409191051284, 
45701, 86301, -4.71151949799025, 42240, 62745, -4.52203636797869
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-30L))

Sample Output

    Species    Variable       Value
1   BABO_BW    obs.C-score    328680.0000000
2   BABO_BW    exp.C-score    276507.0000000
3   BABO_BW    SES_Cscore     6.7335877
4   BABO_MANG  obs.C-score    12474.0000000
5   BABO_MANG  exp.C-score    12190.0000000
6   BABO_MANG  SES_Cscore     1.2462443
7   BABO_RC    obs.C-score    408360.0000000
8   BABO_RC    exp.C-score    345488.0000000
9   BABO_RC    SES_Cscore     5.3134502
10  BABO_SKS   obs.C-score    285090.0000000

I am trying to put the SES_Cscore on the x-axis and have the obs.C-score and exp.C-score as bars. The species column groupings the C-scores, so I would like to include those in the x-axis as well.

I have been able to plot the species at the y and the other variables as bar graphs.

ggplot(data,aes(x = Species,y = value)) + 
    geom_bar(aes(fill = variable),stat = "identity",position = "dodge")

Incorrect graph

I would like to have the continuous variable of SES_Cscore as well on the x-axis. Is there a way to do this?

Thank you in advance and have a lovely day!

  • The issue is not that `SES_Cscore` is not being plotted, it's that all of its values are significantly smaller so are scaled-out. If for instance you started with `data %>% mutate(value = value * ifelse(variable == "SES_Cscore", 50000, 1)) %>% ggplot(...) ...`, it would show. Are you suggesting that you want to change the y-axis scale for that category and not the others? – r2evans Jan 27 '23 at 18:06
  • 1
    I'm not sure I understand how the x axis should both reflect the SES_Cscore as well as the species. Would a label with the species, but a mapping based on SES_Cscore, suffice? – Jon Spring Jan 27 '23 at 18:08
  • Yes, I want to take the SES_Cscore from being part of the bar graphs (for the reason you stated, too small to see when plotted with the others) and have it on the x-axis with the species along the x-axis as names. – Marnee Roundtree Jan 27 '23 at 18:08
  • 1
    Sorry, by "have it on the X-axis" do you mean appended to the species label, or do you want the x axis spacing to be in any way related to the SES_Cscore? – Jon Spring Jan 27 '23 at 18:15
  • If `species` is a discrete variable (i.e. a factor) and SES_Cscore is continuous, there seems to be a fundamental incompatibility. The x-axis for a bar plot is inherently discrete so unclear how you would even add a continuous variable. You would either a) have to discretize the SES_Cscore or B) use a different aesthetic (e.g. color gradient) to map the Cscore – Marcus Jan 27 '23 at 18:26

2 Answers2

2

This could be done by reshaping the data slightly so that SES_Score is recorded as a variable with one value per Species, and not as a variable to be mapped to bar height for each observation. I do that here by reshaping wide (so that the three variables each get their own columns), and then reshaping long again but only for the variables we want to map to y.

library(tidyverse)
data %>%
  pivot_wider(names_from = variable, values_from = value) %>%
  pivot_longer(2:3) %>%
  mutate(Species2 = paste(Species, round(SES_Cscore,digits = 2), sep = "\n") %>%
           fct_reorder(SES_Cscore)) -> data2

data2
## A tibble: 20 × 5
#   Species   SES_Cscore name          value Species2         
#   <chr>          <dbl> <chr>         <dbl> <fct>            
# 1 BABO_BW        6.73  obs.C-score  328680 "BABO_BW\n6.73"  
# 2 BABO_BW        6.73  exp.C-score  276507 "BABO_BW\n6.73"  
# 3 BABO_RC        5.31  obs.C-score  408360 "BABO_RC\n5.31"  
# 4 BABO_RC        5.31  exp.C-score  345488 "BABO_RC\n5.31"  
# 5 BABO_SKS       4.35  obs.C-score  285090 "BABO_SKS\n4.35" 
# etc.

We could alternately achieve the reshaping differently in a way that might be more performant for large data, by making it into a join between the observations we want to map to y, and the observations we want to use for each species' x position:

left_join(data %>% filter(variable != "SES_Cscore"),
          data %>% filter(variable == "SES_Cscore") %>%
            transmute(Species, x_val = value,
                      Species_label = paste(Species, sprintf(value, 
                        fmt = "%#.2f"), sep = "\n") %>% fct_reorder(value))) 

Once reshaped, it's more straightforward to get a plot that is ordered by the SES_Cscore for each species:

ggplot(data2, aes(Species2, value, fill = name)) +
  geom_col(position = "dodge")

enter image description here


If you want to plot with a continuous x axis related to SES_Cscore, you may run into some graphic design challenges, since the data might be bunched up in some cases. Note how the default bar width gets quite squished so that ggplot can keep the 2nd and 3rd Species bars from overlapping.

This approach also takes a little more work, since ggplot's axes work for either discrete (categorical) data, or continuous data, and there isn't a default designed to manage a combination, with categorical data that is mapped continuously. So you'd have to revert to some sort of geom_text to make manual labels, and some customization if you want them to look more like normal axes labels.

ggplot(data2, aes(SES_Cscore, value, fill = name)) +
  geom_col(position = "dodge") +
  ggrepel::geom_text_repel(aes(y = 0, label = Species), 
                           angle = 90, direction = "x", hjust = 0, lineheight = 0.8, size = 3,
                           data = data2 %>% distinct(Species, .keep_all = TRUE))

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • This is so close! Can you make the spacing on the x-axis as a continuous variable so that the negative values are seen on the left and positive on the right? – Marnee Roundtree Jan 27 '23 at 18:23
  • Do you want them in order of SES_Cscore, or actually scaled to it? When the values are close, it's more challenging to label. – Jon Spring Jan 27 '23 at 18:27
  • I think if I have them in order and the raw value there as well (like you have above), it will be enough. – Marnee Roundtree Jan 27 '23 at 18:28
  • Updated to be in order of that value. – Jon Spring Jan 27 '23 at 18:32
  • 1
    Added example of what it could look like to put the bars on a true continuous x axis. It might take an idiosyncratic approach to design label spacings that make sense for your particular data. Depending on the nature of the data, you might also need to "cheat" the x mapping if you want to keep the bar groups from overlapping. – Jon Spring Jan 27 '23 at 18:47
1

Up front, scaling the data and using a second axis can visually misrepresent the data: it's not hard to look at this plot hastily and infer that the blue bars' values mean the same thing as the red/green bars.

Having said that, try this:

library(ggplot2)
library(dplyr)
fac <- 50000
mycolors <- c("obs.C-score" = "red", "exp.C-score" = "green", "SES_Cscore" = "blue")
data %>%
  mutate(value = value * ifelse(variable == "SES_Cscore", fac, 1)) %>%
  ggplot(aes(x = Species, y = value)) +
  geom_bar(aes(fill = variable), stat = "identity", position = "dodge") +
  scale_y_continuous(
    sec.axis = sec_axis(name = "SES_Cscore", ~ . / fac),
    breaks = ~ scales::extended_breaks()(pmax(0, .))
  ) +
  scale_color_manual(values = mycolors) +
  theme(
    axis.title.y.right = element_text(color = mycolors["SES_Cscore"]),
    axis.text.y.right = element_text(color = mycolors["SES_Cscore"]),
    axis.ticks.y.right = element_line(color = mycolors["SES_Cscore"])
  )

ggplot with second axis

I'm using blue colors on the second (right) axis to try to visually pair it with the blue bars. I also took the liberty of keeping the primary (left) axis at 0 or more based on my inference of the data; it is not required at all. Also, I could have omitted scale_color_manual(.) and just assume that out use of element_text(color="blue") is going to be correct; that would fail if/when your data changes with either fewer or more levels within variable, so I control them manually ... and I try to assign everything on the second axis the right color :-)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you for this! It's close and maybe what I'm looking for is not possible. The blue bars are a standardized value from the red and green, yes. With that, I do not want to visualize it as a bar like them. I want to put it on the x-axis, including keeping the negative values to be able to quickly look at the blue and green bars and where they are along a blue continuous variable. – Marnee Roundtree Jan 27 '23 at 18:21
  • You repeatedly say "put it on the x-axis", but what does that mean in the context of a barplot other than adding it as a plot? Do you mean adding the raw value as text, perhaps below the other columns? – r2evans Jan 27 '23 at 18:24
  • I apologize for the confusion. I mean have the x-axis a continuous variable, then have the value of SES_Cscore dictate where along the x-axis the red and green bars fall. – Marnee Roundtree Jan 27 '23 at 18:26
  • 1
    If I understand you correctly, the code will have to manually deal with _dodging_, since the other two variables will each vie for the same horizontal space. Is that right? That's not impossible, but it is certainly _fragile_ where if any of your `SES_Cscore` values shift a little closer, you may mask or make indistinguishable the bars. – r2evans Jan 27 '23 at 18:32
  • Ahh okay, that makes sense. Is there a way to exclude the species, so the SES_CSCore is not competing for the space? I can try and add them in manually later. – Marnee Roundtree Jan 27 '23 at 18:35
  • 1
    It's not `Species` that is causing the congestion, it's that for every distinct value assigned to `SES_Cscore`, you have two other values that you want as bars. As an example of data, try to hand-draw data with two `SES` scores are both exactly 1, and think about how you expect easy deconfliction of the plot canvas. With funcs like `geom_bar`, it handles this with dodging or stacking, so it automatically does it. This works well there because ggplot has complete control over the spaces between the bars (i.e., a categorical variable `Species`). When dodging is not easily handled, how to do it? – r2evans Jan 27 '23 at 18:38
  • 1
    This makes total sense and clears up any residual confusion I had. I think between the answer you and the other generous coder has provided, I will have enough of an output to work with. Thank you so much again! – Marnee Roundtree Jan 27 '23 at 18:49