3

I'm creating a bubble plot of species abundance across several areas/times. I need the Y axis to be in species taxonomic order from the top, so have a separate integer column "Number" which I am using to create the plot to retain the right order. But I then have to replace the Y axis labels with the "Species" column.

I'm probably overlooking something simple but I've tried several things and cannot get anything to work.

The reason I cannot make the plot using the species column is that there are more than 100 species in total, and R turns the order to alphanumeric which is even worse. I tried fixing it that way as well.

I've tried conserving the order when importing, the scale_y_discrete function (see below) and several other solutions proposed elsewhere. Str_order is no longer available as well as one or two other recommended commands that I've come across.

Any advice would be very welcome.

  ggplot(data, aes(x=Sample, y=Number)) +
geom_point(aes(size=ifelse(Value==0, NA, Value), alpha = 0.75)) +
scale_size(range = c(0, 5)) +
scale_y_discrete(limits=c(data$Species)
structure(list(Year = c("1984 - 1989", "2017 - 2020", "1984 - 1989", 
"2017 - 2020", "1984 - 1989", "2017 - 2020", "1984 - 1989", "2017 - 2020", 
"1984 - 1989", "2017 - 2020", "1984 - 1989", "2017 - 2020", "1984 - 1989", 
"2017 - 2020", "1984 - 1989", "2017 - 2020", "1984 - 1989", "2017 - 2020", 
"1984 - 1989", "2017 - 2020", "1984 - 1989", "2017 - 2020", "1984 - 1989", 
"2017 - 2020", "1984 - 1989", "2017 - 2020", "1984 - 1989", "2017 - 2020", 
"1984 - 1989", "2017 - 2020"), Sample = c("Developed_zone_1992", 
"Developed_zone_2020", "Paddock_zone_1992", "Paddock_zone_2020", 
"Sanctuary_zone_1992", "Sanctuary_zone_2020", "Developed_zone_1992", 
"Developed_zone_2020", "Paddock_zone_1992", "Paddock_zone_2020", 
"Sanctuary_zone_1992", "Sanctuary_zone_2020", "Developed_zone_1992", 
"Developed_zone_2020", "Paddock_zone_1992", "Paddock_zone_2020", 
"Sanctuary_zone_1992", "Sanctuary_zone_2020", "Developed_zone_1992", 
"Developed_zone_2020", "Paddock_zone_1992", "Paddock_zone_2020", 
"Sanctuary_zone_1992", "Sanctuary_zone_2020", "Developed_zone_1992", 
"Developed_zone_2020", "Paddock_zone_1992", "Paddock_zone_2020", 
"Sanctuary_zone_1992", "Sanctuary_zone_2020"), Colour = c(1L, 
1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 
3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), Value = c(2L, 
5L, 10L, 10L, 0L, 10L, 2L, 0L, 0L, 5L, 0L, 10L, 0L, 0L, 5L, 10L, 
0L, 5L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 1L, 0L, 10L, 0L, 10L), Family = c("Phasianidae", 
"Phasianidae", "Phasianidae", "Phasianidae", "Phasianidae", "Phasianidae", 
"Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", 
"Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", 
"Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", 
"Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae", "Anatidae"
), Species = c("1. Grey francolin (60)", "1. Grey francolin (60)", 
"1. Grey francolin (60)", "1. Grey francolin (60)", "1. Grey francolin (60)", 
"1. Grey francolin (60)", "2. Egyptian goose (55)", "2. Egyptian goose (55)", 
"2. Egyptian goose (55)", "2. Egyptian goose (55)", "2. Egyptian goose (55)", 
"2. Egyptian goose (55)", "3. Garganey (60)", "3. Garganey (60)", 
"3. Garganey (60)", "3. Garganey (60)", "3. Garganey (60)", "3. Garganey (60)", 
"4. Northern shoveler (62)", "4. Northern shoveler (62)", "4. Northern shoveler (62)", 
"4. Northern shoveler (62)", "4. Northern shoveler (62)", "4. Northern shoveler (62)", 
"5. Mallard (67)", "5. Mallard (67)", "5. Mallard (67)", "5. Mallard (67)", 
"5. Mallard (67)", "5. Mallard (67)"), Number = c(142L, 142L, 
142L, 142L, 142L, 142L, 141L, 141L, 141L, 141L, 141L, 141L, 140L, 
140L, 140L, 140L, 140L, 140L, 139L, 139L, 139L, 139L, 139L, 139L, 
138L, 138L, 138L, 138L, 138L, 138L)), class = "data.frame", row.names = c(NA, 
-30L))

Thanks in advance.

enter image description here

tjebo
  • 21,977
  • 7
  • 58
  • 94
Lisa B
  • 85
  • 6
  • Try something like ```scale_y_discrete(labels = Species)``` ? – geom_na May 19 '20 at 12:39
  • It'd be good to add a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to your question as it will make it easier for others to help you. – anddt May 19 '20 at 12:43
  • you can use dput(data) to share sample data – Harshal Gajare May 19 '20 at 13:08
  • can you turn species into a factor and then use that for the y axis – Mike May 19 '20 at 13:20
  • For guidance on how to place reproducible data into a question have a look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example? in particular the sections: "Producing a minimal dataset" and "Copy your data". And think minimal: less is more! – Peter May 19 '20 at 13:43
  • Thanks all. I've tried scale_y_discrete(labels = Species) @Nomad420 but am getting "Error in check_breaks_labels(breaks, labels) : object 'Species' not found". Also tried scale_y_discrete(labels = data$Species) which removes the numbers from the axis but doesn't add the species names. I also tried data$Species <- as.factor(data$Species) and replotting using Species as the y axis. Again I'm finding the same problem with alphanumberic ordering. – Lisa B May 20 '20 at 06:18
  • In the end I added the labels manually using ylabels. Not the best solution but I needed to move on. I'm still interested in learning the proper way of doing it. Thank you. – Lisa B May 20 '20 at 11:35
  • Does this answer your question? [Labeling x-axis with another column from dataframe](https://stackoverflow.com/questions/57626853/labeling-x-axis-with-another-column-from-dataframe) – tjebo Jul 07 '20 at 17:18

1 Answers1

0

Assuming that you are still interested, and using df as your dataframe, you could use:

sSpeciesLevels <- c("5. Mallard (67)", 
                    "4. Northern shoveler (62)", 
                    "3. Garganey (60)",
                    "2. Egyptian goose (55)",
                    "1. Grey francolin (60)")
            
 ggplot(df, aes(x = Sample, 
                y = factor(Species, 
                           levels = sSpeciesLevels))) +
     geom_point(aes(size = ifelse(Value == 0, NA, Value)), alpha = 0.75) +
     scale_size(range = c(0, 5))

and take it from there.

Constantinos
  • 1,327
  • 7
  • 17
  • Hi. Thanks, I tried but couldn't get it to work. Anyhow I was trying to avoid having to type/format and paste all the names in (my full dataset has 140 species). I thought there would be an easier way to define the order by a specific column. – Lisa B Sep 06 '20 at 10:20