81

I have a data.frame like this:

df <- read.csv(text = "ONE,TWO,THREE
                       23,234,324
                       34,534,12
                       56,324,124
                       34,234,124
                       123,534,654")

I want to produce a percent bar plot which looks like this (made in LibreOffice Calc): enter image description here

Thus, the bars should be standarized so all stacks have the same height and sums to 100%. So far all I have been able to get is is a stacked barplot (not percent), using:

barplot(as.matrix(df))

Any help?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Julio Diaz
  • 9,067
  • 19
  • 55
  • 70

5 Answers5

133

Here's a solution using that ggplot package (version 3.x) in addition to what you've gotten so far.

We use the position argument of geom_bar set to position = "fill". You may also use position = position_fill() if you want to use the arguments of position_fill() (vjust and reverse).

Note that your data is in a 'wide' format, whereas ggplot2 requires it to be in a 'long' format. Thus, we first need to gather the data.

library(ggplot2)
library(dplyr)
library(tidyr)

dat <- read.table(text = "    ONE TWO THREE
1   23  234 324
2   34  534 12
3   56  324 124
4   34  234 124
5   123 534 654",sep = "",header = TRUE)

# Add an id variable for the filled regions and reshape
datm <- dat %>% 
  mutate(ind = factor(row_number())) %>%  
  gather(variable, value, -ind)

ggplot(datm, aes(x = variable, y = value, fill = ind)) + 
    geom_bar(position = "fill",stat = "identity") +
    # or:
    # geom_bar(position = position_fill(), stat = "identity") 
    scale_y_continuous(labels = scales::percent_format())

example figure

MsGISRocker
  • 588
  • 4
  • 21
joran
  • 169,992
  • 32
  • 429
  • 468
  • 1
    what package is melt() part of? Is it reshape2? – Julio Diaz Mar 05 '12 at 19:59
  • 3
    Yes; my apologies. For such a long time ggplot2 loaded those packages on its own, I've grown rusty. – joran Mar 05 '12 at 20:09
  • I tried it using melt from the reshape package and I got the following error: "Error in scale$labels(breaks) : unused argument(s) (breaks)" I wonder if it is because I am reading from a csv. – Julio Diaz Mar 05 '12 at 20:12
  • @JulioDiaz Hmmm. Hard to say what's going on, particularly if the data you're working with don't look exactly like the example in your question. I would make sure all packages are up to date, and that you're on R 2.14.2 (I had to upgrade to 2.14.2 to get some stuff in ggplot 0.9.0 to work). – joran Mar 05 '12 at 20:28
  • I updated the way my data actually comes, which should not be different in any way other than sep=",". I also checked and R.version() 2.14.2 and ggplot2_0.9.0 – Julio Diaz Mar 05 '12 at 20:46
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/8551/discussion-between-julio-diaz-and-joran) – Julio Diaz Mar 05 '12 at 20:51
  • I restarted R and it worked. I don't know what restarting did. – Julio Diaz Mar 05 '12 at 22:01
  • When I try to use this method, I get the following error: `Error: Labels can only be specified in conjunction with breaks` which has to do with the `labels = percent_format()` call. Any thoughts on how to fix this error? – KLee1 Jun 15 '12 at 02:48
  • @KLee1 Check your version of ggplot2. Note that my answer refers specifically to version 0.9.0 (or later). – joran Jun 15 '12 at 02:59
  • @joran Thanks. It turned out that I needed to set the formatter parameter in my version to see the same effect in case anyone reads this in the future. – KLee1 Jun 15 '12 at 17:55
  • @Henk Thanks for the correction. Your edit was rejected by people who didn't know R enough to know that you're right. Until you have the ability to edit Q's yourself, you should expect edit reviews to be fairly "cautious" in this regard. – joran May 12 '15 at 16:56
  • 7
    For those coming to this after 2018, replace "labels = percent_format()" with "scales::percent". – Leonhard Euler Feb 23 '18 at 15:38
  • @stuartstevenson it doesn't appear so simple, please elaborate and edit the answer – MichaelChirico Apr 05 '19 at 10:51
  • @stuartstevenson This should not be required. `scales` is imported automatically, you just have to prefix the function and use `scales::percent_format()`. – slhck Apr 16 '19 at 13:46
  • What if we have no variables, just one bar that we want to colour by proportions? – Rafs Aug 14 '20 at 12:15
  • @joran Hi. can you say how to do this data manipulation without dplyr? – Hamed_Gholami Nov 29 '21 at 13:56
20

Chris Beeley is rigth, you only need the proportions by column. Using your data is:

 your_matrix<-( 
               rbind(
                       c(23,234,324), 
                       c(34,534,12), 
                       c(56,324,124), 
                       c(34,234,124),
                       c(123,534,654)
                    )
                )

 barplot(prop.table(your_matrix, 2) )

Gives:

enter image description here

alemol
  • 8,058
  • 2
  • 24
  • 29
14

prop.table is a nice friendly way of obtaining proportions of tables.

m <- matrix(1:4,2)

 m
     [,1] [,2]
[1,]    1    3
[2,]    2    4

Leaving margin blank gives you proportions of the whole table

 prop.table(m, margin=NULL)
     [,1] [,2]
[1,]  0.1  0.3
[2,]  0.2  0.4

Giving it 1 gives you row proportions

 prop.table(m, 1)
      [,1]      [,2]
[1,] 0.2500000 0.7500000
[2,] 0.3333333 0.6666667

And 2 is column proportions

 prop.table(m, 2)
          [,1]      [,2]
[1,] 0.3333333 0.4285714
[2,] 0.6666667 0.5714286
Chris Beeley
  • 591
  • 6
  • 22
5

You just need to divide each element by the sum of the values in its column.

Doing this should suffice:

data.perc <- apply(data, 2, function(x){x/sum(x)})

Note that the second parameter tells apply to apply the provided function to columns (using 1 you would apply it to rows). The anonymous function, then, gets passed each data column, one at a time.

nico
  • 50,859
  • 17
  • 87
  • 112
  • Hello, this didn't quiet adjusted my data, `rowSums(data.perc)` wasn't 1 for each line. Instead I used this: `data.perc <- apply(data, 2, function(x){x/(apply(data,1,sum))})` – 3nrique0 Aug 09 '18 at 11:54
  • did you have NAs or zero-summing lines? Otherwise I don't quite understand why that wouldn't work... – nico Aug 15 '18 at 10:37
1

Another option is using the scalesextra package with scale_y_pct function which is able to create a percentage scale directly from your data. First, transform the data to a longer format using pivot_longer and create a percentage column per group. Here is a reproducible example:

library(ggplot2)
library(dplyr)
library(tidyr)
# remotes::install_github("thomas-neitmann/scalesextra")
library(scalesextra)
df %>%
  pivot_longer(cols=everything()) %>%
  group_by(name) %>%
  mutate(index = factor(row_number()),
         pct = value/sum(value)*100) %>% # Create percentage values
  ggplot(aes(x = factor(name, levels = unique(name)), y = pct, fill = index)) +
  geom_col() + 
  scale_y_pct() +
  labs(x = "Name")

Created on 2022-08-23 with reprex v2.0.2

For some extra info about this package and function check this tutorial.

Quinten
  • 35,235
  • 5
  • 20
  • 53