2

I'm trying to fix an issue with my GGBalloonPlot graph with regards to how R processes the axis labels.

By default R plots the data using the labels ranked in reverse alphabetical order but to reveal the pattern of the data, the data need to be plotted in a specific order. The only way I've been able to do trick the software is by manually adding a prefix to each label in my .csv table so that R would rank them properly in my output. This is time consuming since I need to manually order the data first before adding the prefix and then plotting.

I would like to input a character vector (or something like that) which would essentially specify the order in which I want to have the data plotted which would reveal the pattern without the need for a prefix in the label name.

I have made some attempts with "scale_y_discrete" without success. I would also like to do the same thing for the X axis since I've had to use the same "trick" to display the columns in the proper non-alphabetical order which offsets the position of the labels. Any idea on how to get GGplot to display my values as seen in the graph without having to "trick" the software since this is quite time consuming ?

Data + Code

#Assign data to "Stack_Overflow_DummyData"

Stack_Overflow_DummyData <- structure(list(Species = structure(c(8L, 3L, 1L, 5L, 6L, 2L, 
                                     7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 
                                     7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 7L, 4L), .Label = c("Ani", "Cal", 
                                                                                         "Can", "Cau", "Fis", "Ort", "Sem", "Zan"), class = "factor"), 
               Species_prefix = structure(c(8L, 7L, 6L, 5L, 4L, 3L, 2L, 
                                            1L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 8L, 7L, 6L, 5L, 4L, 3L, 
                                            2L, 1L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("ac.Cau", 
                                                                                                "ad.Sem", "af.Cal", "ag.Ort", "as.Fis", "at.Ani", "be.Can", 
                                                                                                "bf.Zan"), class = "factor"), Dist = structure(c(2L, 3L, 
                                                                                                                                                 5L, 2L, 1L, 1L, 4L, 5L, 2L, 3L, 5L, 2L, 1L, 1L, 4L, 5L, 2L, 
                                                                                                                                                 3L, 5L, 2L, 1L, 1L, 4L, 5L, 2L, 3L, 5L, 2L, 1L, 1L, 4L, 5L
                                                                                                ), .Label = c("End", "Ind", "Pan", "Per", "Wid"), class = "factor"), 
               Region = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 
                                    4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Cen", "Col", 
                                                                                "Far", "Nor"), class = "factor"), Region_prefix = structure(c(1L, 
                                                                                                                                              1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                                                                                                                              3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                                                                                                                              4L), .Label = c("a.Far", "b.Nor", "c.Cen", "d.Col"), class = "factor"), 
               Frequency = c(75, 50, 25, 50, 0, 0, 0, 0, 11.1, 22.2, 55.6, 
                             55.6, 11.1, 0, 5.6, 0, 0, 2.7, 36.9, 27.9, 65.8, 54.1, 37.8, 
                             28.8, 0, 0, 0, 3.1, 34.4, 21.9, 78.1, 81.3)), class = "data.frame", row.names = c(NA, 
                                                                                                               -32L))



# Plot Data With Prefix Trick

library(ggplot2)
library(ggpubr)

# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region_prefix", y = "Species_prefix", 
              size = "Frequency", size.range = c(1, 9), fill = "Dist") +
  theme_set(theme_gray() + 
  theme(legend.key=element_blank())) + 
  # Sets Grey Theme and removes grey background from legend panel
  theme(axis.title = element_blank()) +
  # Removes X axis title (Region)
  geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4) 
# Add Frequency Values Next to the circles

# Plot Data Without Prefix Trick

library(ggplot2)
library(ggpubr)

# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region", y = "Species", 
              size = "Frequency", size.range = c(1, 9), fill = "Dist") +
  theme_set(theme_gray() + 
  theme(legend.key=element_blank())) + 
  # Sets Grey Theme and removes grey background from legend panel
  theme(axis.title = element_blank()) +
  # Removes X axis title (Region)
  geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4) 
# Add Frequency Values Next to the circles

Here below are the graphs

Good Graph.

Using the label prefix trick with the visible pattern in the data:

enter image description here

Wrong Graph (R default).

Without the prefix trick when GGplot automatically orders the data/labels and the graph makes no sense:

enter image description here

To sum up, I would like the Good graph output without having to have to previously add a prefix in my labels.

Many Thanks in advance for your help.

Etienne
  • 23
  • 4
  • 2
    Hello Etienne, welcome to SO. Sorry to hear that asking a good question gave you a headache, It's just that many people offer help for free and the least we can expect is that a 'helpee' meets the helpers halfway. I can see that you put everything together. thanks for that. Just two more things: if you have trouble with formatting code, there is a help function that tells you how you can do that. Then it usually works for me. And please ask 1 question at a time. So please split your question. Reason is that we want everyone to benefit from the answers and mixed questions make that difficult. – Jan Jun 12 '20 at 11:00
  • The code in the link is a docx file. Copy its contents and paste into a text editor. Get rid of the smart quotes, if any. Then it's a matter of copying the code into a SO edit box. To format it as code put 3 backticks before and after the code block or indent 4 spaces each line. – Rui Barradas Jun 12 '20 at 11:18
  • A big goal for Stack Overflow is for questions and answers to provide a resource for future users with the same problems. That's another good reason to have questions be minimal, reproducible, and self-contained. If your question depends on code and data in your personal dropbox, the question begins as less approachable - no one can immediately see your attempt or your data. And should those links ever go stale, all reproducibility is lost. We want to help you - but we want to help more people than just you. So we want you to ask your question in a way that can help more people too. – Gregor Thomas Jun 12 '20 at 13:25
  • On that note - I'd encourage you to share a **minimal** reproducible example. We don't need all your data to adjust the labels and legends - just a few rows. Sharing the data from 3 or 4 species is probably plenty. – Gregor Thomas Jun 12 '20 at 13:27
  • Thanks for all your feedback and for editing the post. I modified the post according to your comments, namely: - Making it one single question. - Sharing minimal reproducible example which doesn't rely on DropBox link so that the question is useful to others in the future. – Etienne Jun 17 '20 at 08:51

1 Answers1

0

For the axis labels I would define a previous function to override the breaks:

shlab <- function(lbl_brk){
  sub("^[a-z]+\\.","",lbl_brk) # removes the starts of strings as a. or ab.
}

Then, to change the labels you just have to use scale_x,y_discrete with labels = shlab (if you look at the help of scale_x_discrete you will see that one of the options for labels is A function that takes the breaks as input and returns labels as output).

For the colours would be enough to change them (values) in scale_fill_manual and for the sizes, using guides so:

library(ggplot2)
library(ggpubr)
shlab <- function(lbl_brk){
  sub("^[a-z]+\\.","",lbl_brk)
}
ggballoonplot(Stack_Overflow_DummyData, x = "Region_prefix", y = "Species_prefix", size = "Frequency", size.range = c(1, 9), fill = "Dist") +
  scale_x_discrete(labels = shlab) +
  scale_y_discrete(labels = shlab) +
  scale_fill_manual(values = c("green", "blue", "red", "black", "white")) +
  guides(fill = guide_legend(override.aes = list(size=8))) +
  theme_set(theme_gray() + theme(legend.key=element_blank())) +     # Sets Grey Theme and removes grey background from legend panel
  theme(axis.title = element_blank()) +                             # Removes X axis title (Region)
  geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4) # Add Frequency Values Next to the circles

enter image description here

UPDATE:

With the new dataset and vector labels:

library(ggplot2)
library(ggpubr)

# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region", y = "Species", 
              size = "Frequency", size.range = c(1, 9), fill = "Dist") +
  scale_y_discrete(limits = c("Cau", "Sem", "Cal", "Ort", "Fis", "Ani", "Can", "Zan")) +
  scale_x_discrete(limits = c("Far", "Nor", "Cen", "Col")) +
  theme_set(theme_gray() + 
              theme(legend.key=element_blank())) + 
  # Sets Grey Theme and removes grey background from legend panel
  theme(axis.title = element_blank()) +
  # Removes X axis title (Region)
  geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4) 

enter image description here

iago
  • 2,990
  • 4
  • 21
  • 27
  • This is great to remove the prefix and change the legend! However it doesn't solve the issue of needing a prefix in the first place to have the plot come out this way. I would like this same output without the need of a prefix altogether by simply providing the desired label order as a character vector (or similar). – Etienne Jun 17 '20 at 08:59
  • You can look at here: https://stackoverflow.com/questions/3253641/order-discrete-x-scale-by-frequency-value – iago Jun 17 '20 at 14:06
  • Just adding: `scale_y_discrete(limits = c("Cau", "Sem", "Cal", "Ort", "Fis", "Ani", "Can", "Zan")) + scale_x_discrete(limits = c("Far", "Nor", "Cen", "Col")) +` – iago Jun 17 '20 at 14:16
  • Awesome ! This is exactly what I trying to achieve ! I was probably just using the function inappropriately when I first tried it. Many thanks. – Etienne Jun 19 '20 at 04:48