4

I am new to R. I would like others to explain to me how to add absolute values inside the individual stacked bars in a consistent way using the basic R plotting function (R base). I tried to plot a stacked bar graph using R base but the values appear in an inconsistent/illogical way in such a way that its supposed to be 100% for each village but they don't sum up to 100%. Here is the data that am working on:

Village      100         200       300  400      500

Male    68.33333    53.33333        70   70 61.66667

Female  31.66667    46.66667        30   30 38.33333

In summary, there are five villages and the data showing the head of household interviewed by sex.

I have used the following command towards plotting the graph:

barplot(mydata,col=c("yellow","green")
x<-barplot(mydata,col=c("yellow","green")
text(x,mydata,labels=mydata,pos=3,offset=.5)

Please help to allocate the correct values in each bar Thanks

  • 3
    Welcome to the site. As this is more about how to do something in R than the statistics behind it, it is probably better on Stack Overflow. In the meantime, how does the Village variable fit in - is it shown in your barplot at all? Secondly, are you interested in possible alternatives to a bar plot (which might mean it stays more of an on-topic for Cross Validated question). – Peter Ellis Feb 18 '13 at 07:31
  • hmm, definitely a duplicate, can they be merged? – Peter Ellis Feb 18 '13 at 10:04

1 Answers1

13

This started as a comment but it seemed unfair to not turn into an answer. To answer your question (even on Stack Overflow) properly we need to know how "mydata" is structured. I assumed at first it was a data frame with 5 rows and 2 or 3 columns but in this case your code makes no sense. However, if this were how it is structured here is one way to do what I think you want:

mydata <- data.frame(
    row.names =c(100, 200, 300, 400, 500),
    Male =c(68.33333, 53.33333, 70, 70, 61.66667),
    Female =c(31.66667, 46.66667, 30, 30, 38.33333))

x <- barplot(t(as.matrix(mydata)), col=c("yellow", "green"), 
    legend=TRUE, border=NA, xlim=c(0,8), args.legend=
        list(bty="n", border=NA), 
    ylab="Cumulative percentage", xlab="Village number")
text(x, mydata$Male-10, labels=round(mydata$Male), col="black")
text(x, mydata$Male+10, labels=100-round(mydata$Male))

which produces the following:

enter image description here

An alternative would be to set the y value to 40 for all the male text labels, and 80 for all the females - this would have the advantage of less confusing jitter of the labels, and the disadvantage that the text vertical position is no longer notionally attached to data.

Personally, I don't much like this barplot at all, although there are many far worse crimes against data visualisation than a straightforward bar plot. Numbers on plots are cluttering and detract from the visual impact of the actual mapping of data to colours, shapes and sizes. I'd rather a simple dot plot like:

library(ggplot2)
ggplot(mydata, aes(x=row.names(mydata), y=Male)) +
  geom_point(size=4) +
  coord_flip() +
  labs(x="Village number\n", y="Percentage male") +
  ylim(0,100) +
  geom_hline(yintercept=50, linetype=2)

which gives

enter image description here

There is less redundant clutter in the plot, a higher data to ink ratio, etc. However in the end you need to produce the plot that will mean something for your audience.

Peter Ellis
  • 5,694
  • 30
  • 46