4

I have a histogram with some text and I m trying to center it for the corresponding type

df = read.table(text = "
id   year  type amount
                1  1991  HIIT     22
                2  1991 inter    144
                3  1991  VIIT     98
                4  1992  HIIT     20
                5  1992 inter    136
                6  1992  VIIT    108
                7  1993  HIIT     20
                8  1993 inter    120
                9  1993  VIIT    124
                10 1994  HIIT     26
                11 1994 inter    118
                12 1994  VIIT    120
                13 1995  HIIT     23
                14 1995 inter    101
                15 1995  VIIT    140
                16 1996  HIIT     27
                17 1996 inter    103
                18 1996  VIIT    162
                19 1997  HIIT     24
                20 1997 inter     96
                21 1997  VIIT    172
                22 1998  HIIT     24
                23 1998 inter     92
                24 1998  VIIT    177
                25 1999  HIIT     28
                26 1999 inter     45
                27 1999  VIIT    220
                28 2000  HIIT     26
                29 2000 inter     36
                30 2000  VIIT    231", header = TRUE, sep = "")
library(dplyr);
library(ggplot2);
library(scales);

df %>%
  mutate(type = factor(type, levels = c("inter",  "VIIT", "HIIT"))) %>%
  group_by(year) %>%
  mutate(ratio = amount/sum(amount),
         pos=cumsum(ratio)-ratio/2) %>%
  ggplot(aes(x=factor(year), y=ratio, fill=type)) +
  geom_bar(stat="identity") +
  geom_text(aes(y = pos, label = percent(pos)), size = 4) +
  scale_y_continuous(name="", labels = percent) +
  coord_flip()

My plot look like : enter image description here

Can you help me to solve this problem because I have no idea how to fix it with the position parameter

Thanks

Peter Ellis
  • 5,694
  • 30
  • 46
Mostafa90
  • 1,674
  • 1
  • 21
  • 39
  • this may be a duplicate of this answer: http://stackoverflow.com/questions/6644997/showing-data-values-on-stacked-bar-chart-in-ggplot2 – Sathish Jan 14 '17 at 03:26
  • 1
    `geom_text(aes(label = percent(pos)), position = position_stack(vjust = 0.5), size = 4)` – Sathish Jan 14 '17 at 03:27
  • 3
    Possible duplicate of [Showing data values on stacked bar chart in ggplot2](http://stackoverflow.com/questions/6644997/showing-data-values-on-stacked-bar-chart-in-ggplot2) – Sathish Jan 14 '17 at 03:43

2 Answers2

8

I'm not sure exactly what you're trying to do, but assume that you want the text in the middle of each colored segment of the bar to

  1. be in in the center of the bar; and
  2. be for a meaningful number showing the size of the bar

To fix the first issue, you need to have the data sorted by type before you calculate the pos value via cumsum.

To fix the second, you should be showing ratio to the label aesthetic, not pos, which is not a meaningful number other than being the horizontal coordinate to place the label.

df %>%
  mutate(type = factor(type, levels = c("inter",  "VIIT", "HIIT"))) %>%
  group_by(year) %>%
  arrange(desc(type)) %>%
  mutate(ratio = amount / sum(amount),
         pos = cumsum(ratio) - ratio / 2) %>%
  ggplot(aes(x = factor(year), y = ratio, fill = type)) +
  geom_bar(stat = "identity") +
  geom_text(aes(y = pos, label = percent(ratio)), size = 4) +
  scale_y_continuous(name="", labels = percent) +
  coord_flip()

enter image description here

By the way, this isn't a histogram, it's a stacked filled bar chart. A histogram is something quite different.

Edit - alternative, easier method

As pointed out in the comments, the fairly recent addition to ggplot2, position_stack, will calculate the position for you rather than creating pos variable in your data pipeline. So the code below is perhaps a neater way of doing the whole thing (gives identical result):

df %>%
  group_by(year) %>%
  mutate(type = factor(type, levels = c("inter",  "VIIT", "HIIT"))) %>%
  mutate(ratio = amount / sum(amount)) %>%
  ggplot(aes(x = factor(year), y = ratio, fill = type)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = percent(ratio)), position = position_stack(vjust = 0.5), size = 4) +
  scale_y_continuous(name="", labels = percent) +
  coord_flip()
Peter Ellis
  • 5,694
  • 30
  • 46
  • percent values in the text don't match the plot in the question – Sathish Jan 14 '17 at 03:31
  • I don't understand? I was presuming the OP wants the values to be the size of the bars, not the horizontal coordinate of where the text is placed (which makes no sense). – Peter Ellis Jan 14 '17 at 03:33
  • Hence my problem #2 - I think the plot in the OP has the wrong numbers in the text, as well as the wrong positioning. – Peter Ellis Jan 14 '17 at 03:33
  • OP talks about "I have no idea how to fix it with the position parameter " – Sathish Jan 14 '17 at 03:34
  • Thanks OP for the confirmation. I am upvoting Peter's answer – Sathish Jan 14 '17 at 03:37
  • 2
    @PeterEllis I am getting the same percentage as your answer, if I dont change anything except introduce `position = position_stack(vjust = 0.5)` inside `geom_text()` – Sathish Jan 14 '17 at 03:41
  • Just one more question i just tried to `ggplotly` but it seems to have some bug can you confirm it to me ty – Mostafa90 Jan 14 '17 at 03:44
  • I am using `library('ggplot2')` and the version is `packageVersion('ggplot2')` `# [1] ‘2.2.1’` – Sathish Jan 14 '17 at 03:46
  • 2
    @Sathish you are right, `position_stack` is an easier way of doing it; I've added it as an alternative to my answer. – Peter Ellis Jan 14 '17 at 03:49
  • @DimitriPetrenko that (re plotly) sounds like a new question, i can't possibly judge if it's a bug or not without any description of the problem! I suggest you start a new question for that. – Peter Ellis Jan 14 '17 at 03:50
  • Thanks very much guys for help very instructive – Mostafa90 Jan 14 '17 at 03:51
2

Try below code

library(ggplot2)
library(dplyr)
library(formattable)

df <- read.table(text = "
                 id year type  amount
                 1  1991 HIIT  22
                 2  1991 inter 144
                 3  1991 VIIT  98
                 4  1992 HIIT  20
                 5  1992 inter 136
                 6  1992 VIIT  108
                 7  1993 HIIT  20
                 8  1993 inter 120
                 9  1993 VIIT  124
                 10 1994 HIIT  26
                 11 1994 inter 118
                 12 1994 VIIT  120
                 13 1995 HIIT  23
                 14 1995 inter 101
                 15 1995 VIIT  140
                 16 1996 HIIT  27
                 17 1996 inter 103
                 18 1996 VIIT  162
                 19 1997 HIIT  24
                 20 1997 inter 96
                 21 1997 VIIT  172
                 22 1998 HIIT  24
                 23 1998 inter 92
                 24 1998 VIIT  177
                 25 1999 HIIT  28
                 26 1999 inter 45
                 27 1999 VIIT  220
                 28 2000 HIIT  26
                 29 2000 inter 36
                 30 2000 VIIT  231", header = TRUE, sep = "")


pd <- df %>%
  mutate(type = factor(type, levels = c("inter", "VIIT", "HIIT"))) %>%
  group_by(year) %>%
  arrange(year, desc(type)) %>% 
  mutate(ratio = amount/sum(amount),
         pos = cumsum(ratio) - ratio/2)

pd %>% 
  ggplot(aes(x = factor(year), y = ratio, fill = type)) +
  geom_bar(stat = "identity") +
  geom_text(aes(y = pos, label = percent(ratio)), size = 4) +
  scale_y_continuous(name = "", labels = percent) +
  coord_flip()
Lovetoken
  • 438
  • 4
  • 11