0

I have a dataset consisting of counts for 700+ categories(discrete data) grouped by sex. I would like to display the top 50 categories ranked by ascending or descending order (I know the code for ranking). The categories can also be eliminated by setting a cut-off for the count (in this case I used 50,000 counts) The problem here is that I cannot set the discrete axis limits based on the categories that have already been reordered by ggplot2.

I have already tried to arrange with dplyr but its not letting me arrange by aggregated data from only a particular layer of groups within the dataset. I have tried coord cartesian and scale_y_continous.

Ideally, I would like a code that just allows me to cut off the last 600 of the re-ordered data.

library(ggplot2)
library(scales)

ggplot(df, aes(species, counts)) +
  geom_linerange(
    aes(x = reorder (species, counts), ymin = 0, ymax = counts, group = sex), 
    color = "lightgray", size = 1.5,
    position = position_dodge(0.3)
    )+
  geom_point(
    aes(colour = sex),
    position = position_dodge(0.3), size = 3
    )+
  theme(axis.text.x= element_text(angle=90))+
    scale_y_continuous(limits=c(50000,500000),labels = comma)+
    scale_color_manual(values = c("#0080FF", "#FA1212"))

Scale_y_continous only removed the 600+ data points I did not want but the axis labels and axis size still remained.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • It might be easier if you filter the dataset for the top 50 (or whatever number) categories, *before* passing it into `ggplot()`. – Z.Lin Sep 14 '19 at 12:43
  • Thanks for your reply. The dataset has already somewhat been processed. I tried using the arrange function in dyplyr but I can't seem to get the data to show me the top 50 by combined sex (male and female). Here is the code I used to process the raw data. ```` library(dplyr) df<- rawdata %>% select(species, Age.Group, Sex, day.counts, sightings) %>% group_by(species, Sex) %>% summarize(counts=sum(day.counts)) ```` – Wong Yide Sep 14 '19 at 13:24
  • @Yong Yide Please **edit** your question with code instead of adding it in the comment. That will make it easier for others to read. A reproducible sample of your dataset would also be useful; see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Z.Lin Sep 14 '19 at 13:39

0 Answers0