How to graph top most important categories among 25 with stacked bar chart

Question

My problem is a bit more complicated than the one this question.

I wanted to stack 10 most abundant species per each Rot.Herb (18 of them in total) and group other species to two big categories, which are Other Monocots and Other Dicots. I think I will need to manually assign which is Monocot which is Dicot. The tricky part is that the 10 most abundant species group is unique to every Rot. Herb.

Here is the graph of everything stacked:

And here is the code:

weedweights<-weeds%>%
    select(-ends_with("No"))%>%
    gather(key=species, value=speciesmass, DIGSAWt:POLLAWt)%>%
    mutate(realmass=speciesmass * samplearea.m.2.)%>%
    group_by(Rot.Herb, species)%>%
    summarize(avgrealmass=mean(realmass, na.rm=TRUE))%>%
    filter(avgrealmass != "NaN")%>%
    ungroup()

ggplot(weedweights, aes(x=Rot.Herb, y=avgrealmass, fill=species))+
        geom_bar(stat="identity")

You can see the data here

structure(list(Rot.Herb = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 
16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 
18L, 18L, 18L, 18L, 18L), .Label = c("A4conv", "A4low", "C2conv", 
"C2low", "C3conv", "C3low", "C4conv", "C4low", "O3conv", "O3low", 
"O4conv", "O4low", "S2conv", "S2low", "S3conv", "S3low", "S4conv", 
"S4low"), class = "factor"), species = c("DIGSA", "SETFA", "SETLU", 
"AMATA", "CHEAL", "PHYSU", "TAROF", "EPHHT", "SONAR", "MORAL", 
"OXAST", "Unknownmonocot", "DIGSA", "SETFA", "SETLU", "AMATA", 
"CHEAL", "SOLPT", "TAROF", "EPHHT", "SONAR", "Unknowndicot", 
"Unknownmonocot", "SETFA", "AMATA", "SOLPT", "PHYSU", "TAROF", 
"EPHHT", "MORAL", "SETFA", "AMATA", "CHEAL", "SOLPT", "PHYSU", 
"POLPY", "ABUTH", "TAROF", "EPHHT", "SONAR", "ASCSY", "SETFA", 
"SETLU", "AMATA", "CHEAL", "SOLPT", "PHYSU", "ABUTH", "TAROF", 
"MORAL", "DIGSA", "SETFA", "SETLU", "ERBVI", "AMATA", "CHEAL", 
"SOLPT", "PHYSU", "ABUTH", "TAROF", "MORAL", "Unknowndicot", 
"Unknownmonocot", "DIGSA", "SETFA", "SETLU", "AMATA", "SOLPT", 
"PHYSU", "ABUTH", "TAROF", "EPHHT", "SONAR", "MORAL", "OXAST", 
"DIGSA", "SETFA", "SETLU", "ECHCG", "AMATA", "CHEAL", "SOLPT", 
"PHYSU", "POLPY", "ABUTH", "TAROF", "EPHHT", "OXAST", "POLLA", 
"SETFA", "SETLU", "AMATA", "CHEAL", "SOLPT", "POLPY", "TAROF", 
"POLAV", "PLAMA", "Unknownmonocot", "SETFA", "SETLU", "AMATA", 
"CHEAL", "SOLPT", "PHYSU", "TAROF", "Unknownmonocot", "DIGSA", 
"SETFA", "SETLU", "PANCA", "CYPES", "AMATA", "CHEAL", "SOLPT", 
"PHYSU", "TAROF", "EPHHT", "CIRAR", "OXAST", "DIGSA", "SETFA", 
"SETLU", "PANCA", "CYPES", "AMATA", "CHEAL", "SOLPT", "PHYSU", 
"TAROF", "EPHHT", "SONAR", "MORAL", "Unknownmonocot", "AMATA", 
"CHEAL", "SOLPT", "PHYSU", "POLPY", "TAROF", "MORAL", "DIGSA", 
"SETFA", "PANCA", "ECHCG", "ERBVI", "AMATA", "CHEAL", "SOLPT", 
"PHYSU", "ABUTH", "TAROF", "MORAL", "AMATA", "CHEAL", "SOLPT", 
"PHYSU", "ABUTH", "TAROF", "MORAL", "Unknowndicot", "SETFA", 
"AMATA", "CHEAL", "SOLPT", "ABUTH", "TAROF", "EPHHT", "SETLU", 
"ECHCG", "AMATA", "CHEAL", "SOLPT", "PHYSU", "ABUTH", "TAROF", 
"EPHHT", "MORAL", "Unknowndicot", "DIGSA", "SETFA", "AMATA", 
"CHEAL", "TAROF"), avgrealmass = c(6.25, 26.35, 58.35, 13.4666666666667, 
17.1, 1.15, 28.75, 0.45, 0, 0.2, 1.2, 0, 6.425, 18.65, 6.63333333333333, 
3.475, 6.11666666666667, 16.1, 41.9625, 0.9, 0, 0, 0, 0.0809410748974746, 
0.237427153032592, 0.0917332182171379, 0.0647528599179797, 0.105223397366717, 
0, 0.0539607165983164, 0.795920569825167, 5.7818907835096, 13.3822577163825, 
1.62151953377941, 3.1099359666163, 0.388517159507878, 0.0539607165983164, 
0.0845384560040291, 0.0701489315778114, 0.0539607165983164, 0.0215842866393266, 
0.0539607165983164, 54.8240880638895, 0, 0.0269803582991582, 
0.102525361536801, 0.0215842866393266, 0.0647528599179797, 0.0404705374487373, 
0.0161882149794949, 0.485646449384848, 11.2103388733002, 86.4990287071012, 
22.9333045542845, 13.9218648823656, 49.6798330815167, 0.0944312540470537, 
0.661018778329376, 0.410101446147205, 0.399309302827542, 0, 1.18173969350313, 
0.0161882149794949, 0.134901791495791, 1.24649255342111, 1.95877401251889, 
0.00269803582991582, 0.364234837038636, 0.555795380962659, 0.356140729548888, 
0.0350744657889057, 0.0944312540470537, 0.00809410748974746, 
0.00539607165983164, 0, 1.42186488236564, 15.7794625512627, 0.584574429815095, 
11.7094755018347, 1.75372328944528, 2.4552126052234, 0.50992877185409, 
0.0863371465573063, 0.221238938053097, 9.53305993236924, 0.106572415281675, 
0.117364558601338, 0.075545003237643, 1.40297863155623, 31.45, 
14.0666666666667, 18.7375, 15.225, 22.3166666666667, 24.05, 8.775, 
1.05, 0.4, 0, 8.55, 35.475, 31.4375, 35.4375, 15.4, 16.55, 7.15, 
0, 105.05, 5.775, 0.8, 0.1, 37.85, 23.3375, 6.35, 97.4, 22.925, 
138.2875, 8.26666666666667, 0.2, 16.25, 8.075, 28.9, 10.1, 1.05, 
8.85, 34.6375, 59.425, 87.7, 4.45, 179.9875, 1.8, 34.45, 0, 0, 
0.585473775091733, 0.0161882149794949, 0, 0, 0.113317504856464, 
0.305777394057126, 0, 1.61342542628966, 1.62961364126916, 2.36887545866609, 
7.94301748327218, 17.4832721778545, 30.8034750701489, 3.40761925318368, 
0.627743003093748, 0.582775739261817, 1.46773149147421, 0.0575580977048708, 
0.00899345276638607, 0.539607165983164, 0.364234837038636, 0.0431685732786531, 
0, 0.407403410317289, 0.0229333045542845, 0, 0, 21.8540902223182, 
43.1591301532484, 57.2172458450248, 0.793222533995251, 1.14396719188431, 
0.215842866393266, 0.113317504856464, 0.0647528599179797, 0.0917332182171379, 
0.453270019425858, 0.0431685732786531, 0.0485646449384848, 0.0161882149794949, 
0.879559680552558, 0.00269803582991582, 0.0161882149794949, 0.0143895244262177, 
0.0215842866393266, 0.075545003237643, 5.71983595942154, 34.9719404273689, 
4.31685732786531, 0)), .Names = c("Rot.Herb", "species", "avgrealmass"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-184L))

I'd be much more inclined to work on an answer if you made a small example dataset that illustrates the problem and doesn't involve downloading some file. Maybe you could `dput()` data for 3 rot.herb levels and 5 species, hoping to plot the top 3? And if you think the Monocot and Dicot categorization matters, maybe do that first? — Gregor Thomas, Jul 23 '15 at 23:50
Yes, please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). I don't want to have to download that file and rerun the code scattered throughout your related questions in order to reconstruct `weeds` all the time. How do you know if a species is "abundant"? does it the most rows in the dataframe? the highest total `realmass`? etc — mathematical.coffee, Jul 24 '15 at 00:36
This is more about data management than plotting: try converting your data into the categories you need, then plot them, rather than try doing the conversions by plotting. — Scransom, Jul 24 '15 at 04:19
Sorry for the inconvenience. I've added the data. I can't convert my data as suggested because the number of species and abundance are different for each Rot.Herb. Maybe I have to graph each Rot.Herb individually? — Little Bee, Jul 24 '15 at 16:29

score 0 · Answer 1 · answered Jul 24 '15 at 17:17

0

If I'm understanding this right (assuming that the data you posted is for the processed weedweights frame), then all you need to do is use the top_n builtin in dplyr.

topweights <- weedweights %>% group_by(Rot.Herb) %>% top_n(10, avgrealmass)

In addition, you can clean up the visualization by sorting by abundance with %>% arrange(-avgrealmass) before plotting, so that the most abundant will be at the base.

answered Jul 24 '15 at 17:17

user295691

7,108
1
26
35

@user259691 Thank you. You understood it right. I used your code but R is doing the opposite. It puts the top weights on top, not at the base. I tried to switch between `avgrealmass` and `-avgrealmass` but it didn't work. I'm using the lastest version of R. Do you know why that happened? – Little Bee Jul 27 '15 at 16:32
So if I do `weedweights %>% group_by(Rot.Herb) %>% top_n(10, avgrealmass) %>% arrange(-avgrealmass) %>% ggplot() + geom_bar(aes(x=Rot.Herb, y=avgrealmass, fill=species), stat="identity")`, I get the largest at the base -- maybe you were putting `-avgrealmass` in the `top_n` instead of the `arrange`? – user295691 Jul 27 '15 at 17:51
Yes, I finally get the largest ones at the bottom – Little Bee Oct 15 '15 at 21:40

score 0 · Answer 2 · answered Oct 15 '15 at 21:41

This is the final set of code that I found runs well.

weedweights<-weeds%>%
  select(-ends_with("No"))%>%
  gather(key=species, value=speciesmass, DIGSAWt:UnknownmonocotWt)%>%
  mutate(realmass=speciesmass * samplearea.m.2.)%>%
  group_by(Rot.Herb, species)%>%
  summarize(avgrealmass=mean(realmass, na.rm=TRUE))%>%
  filter(avgrealmass != "NaN")%>%
  arrange(-avgrealmass) %>% 
  ungroup()

How to graph top most important categories among 25 with stacked bar chart

2 Answers2