0

I know this gets asked a lot, but I'm having trouble making a 100% stacked bar plot in R. I know there are tons of pages out there explaining how, but nothing is working and I think the data I'm importing isn't configured correctly, so basically I want to know what I'm doing wrong in that respect. The data I'm using looks like the data in the attached picture. I'm able to create the exact chart I want in Excel, which I've also attached (the bar graph on the right; I couldn't attach more than one picture so they're just both in the same one), but for various reasons I need it to be in R. Is the way the data is written in Excel incorrect, and if so, how do I make it right?

data being used on left, correct excel graph on right

Miha
  • 2,559
  • 2
  • 19
  • 34
GRSB.1
  • 3
  • 2
  • 1
    Can you add some code that you tried and where things went wrong? Right now it seems like a duplicate to me, possibly of, e.g., [this question](https://stackoverflow.com/questions/6693257/making-a-stacked-bar-plot-for-multiple-variables-ggplot2-in-r). But there may be subtle differences that we'll be able to see once you have added some code. Read [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for ideas on how to make your question reproducible. – aosmith Aug 07 '18 at 18:01

1 Answers1

1

In ggplot2 at least, you need to convert your data from "wide" to "long" format. Below, I use the tidyr::gather function to "gather" the two data columns ("running" and "jumping") into a single "fraction" column, which you can then color by "activity".

library(magrittr)                       # For pipe (%>%)

dat <- tibble::tibble(
  weeks = 1:15,
  running = runif(15, 0, 1),
  jumping = 1 - running
)

dat
#> # A tibble: 15 x 3
#>    weeks running jumping
#>    <int>   <dbl>   <dbl>
#>  1     1  0.675   0.325 
#>  2     2  0.727   0.273 
#>  3     3  0.430   0.570 
#>  4     4  0.324   0.676 
#>  5     5  0.809   0.191 
#>  6     6  0.260   0.740 
#>  7     7  0.433   0.567 
#>  8     8  0.872   0.128 
#>  9     9  0.0288  0.971 
#> 10    10  0.903   0.0970
#> 11    11  0.295   0.705 
#> 12    12  0.538   0.462 
#> 13    13  0.342   0.658 
#> 14    14  0.291   0.709 
#> 15    15  0.877   0.123

library(ggplot2)

dat_long <- dat %>%
  tidyr::gather(activity, fraction, running, jumping)

dat_long
#> # A tibble: 30 x 3
#>    weeks activity fraction
#>    <int> <chr>       <dbl>
#>  1     1 running    0.675 
#>  2     2 running    0.727 
#>  3     3 running    0.430 
#>  4     4 running    0.324 
#>  5     5 running    0.809 
#>  6     6 running    0.260 
#>  7     7 running    0.433 
#>  8     8 running    0.872 
#>  9     9 running    0.0288
#> 10    10 running    0.903 
#> # ... with 20 more rows

ggplot(dat_long) +
  aes(x = factor(weeks), y = fraction, fill = activity) +
  geom_col()

You can also do this in base R by converting to a "wide" matrix. (Note that I also use [, -1] to drop the first column).

dat_tmat <- t(as.matrix(dat[, -1]))
dat_tmat
#>              [,1]      [,2]      [,3]      [,4]       [,5]      [,6]
#> running 0.5227949 0.5352537 0.5879579 0.2678927 0.93068128 0.2948861
#> jumping 0.4772051 0.4647463 0.4120421 0.7321073 0.06931872 0.7051139
#>               [,7]      [,8]      [,9]       [,10]      [,11]     [,12]
#> running 0.07729363 0.8925416 0.5503279 0.007479232 0.02991765 0.5832765
#> jumping 0.92270637 0.1074584 0.4496721 0.992520768 0.97008235 0.4167235
#>             [,13]     [,14]     [,15]
#> running 0.8660134 0.1156794 0.3176998
#> jumping 0.1339866 0.8843206 0.6823002

barplot(dat_tmat, col = c("blue", "red"))
legend("topleft", c("running", "jumping"), col = c("blue", "red"), lwd = 5, bg = "white")

Alexey Shiklomanov
  • 1,592
  • 13
  • 23
  • Thank you!! Is there a way to do this by importing the data, instead of typing it all up? (for some reason I can't tag you @Alexey) – GRSB.1 Aug 07 '18 at 18:16
  • Of course. R can import a wide variety of data types. I would read through The "Data Import" chapter of "R for Data Science" by Garret Grolemund and Hadley Wickham (http://r4ds.had.co.nz/data-import.html). There are R packages for reading directly from Excel, but it's probably easier to export to CSV. Also, if this answer works for you, please accept it (click the grey check mark) and upvote it (click the up arrow). – Alexey Shiklomanov Aug 07 '18 at 18:20
  • What I mean is, how do I convert the imported data from wide to long, and then do the same thing? I – GRSB.1 Aug 07 '18 at 18:26
  • As I said in my answer, `tidyr::gather` will convert data from wide to long. The code I have above already does this, and you can find more information and examples in the documentation (`?tidyr::gather` at the R prompt). To convert from long to wide, use `tidyr::spread`. – Alexey Shiklomanov Aug 07 '18 at 18:30