2

I am trying to plot waterfall chart using ggplot2. When I am placing the data labels it is not putting in the right place.

Below is the code I am using

dataset <- data.frame(TotalHeadcount = c(-417, -12, 276, -276, 787, 14), Category =  LETTERS[1:6])
dataset$SortedCategory <- factor(dataset$`Category`, levels = dataset$`Category`)
dataset$id <- seq_along(dataset$TotalHeadcount)
dataset$type <- ifelse(dataset$TotalHeadcount > 0, "in",   "out")
dataset[dataset$SortedCategory %in% c("A", "F"), "type"] <- "net"
dataset$type <- factor(dataset$type, levels = c("out", "in", "net"))
dataset$end <- cumsum(dataset$`TotalHeadcount`)
dataset$end <- c(head(dataset$end, -1), 0)
dataset$start <- c(0, head(dataset$end, -1))
dataset$value <-dataset$`TotalHeadcount`
library(ggplot2)
strwr <- function(str) gsub(" ", "\n", str)
ggplot(dataset, aes(fill = type))+ geom_rect(aes(x = SortedCategory,  xmin = id - 0.45, xmax = id + 0.45, ymin = end,   ymax = start))+ scale_x_discrete("", breaks = levels(dataset$SortedCategory), labels = strwr(levels(dataset$SortedCategory)))+ theme_bw()+ theme(panel.border = element_blank(), panel.grid.major = element_blank(), axis.line = element_line(colour = "gray"))+guides(fill=FALSE)

And below is the output. I want the data label to be just at the beginning or at the end of the bar. I am not very expert in R. Just trying to learn. Any help would be really appreciated.

I was following the below blog

https://learnr.wordpress.com/2010/05/10/ggplot2-waterfall-charts/

but somehow when I write the same code in geom_text it gives me an error. Could be a syntax related issue.

Akash
  • 359
  • 1
  • 7
  • 27
  • 1
    Do you have a reproducible example we can use? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – MBorg Mar 30 '18 at 09:22
  • Thank you for quick response, I am using Power BI as a source of my dataset and it has my real time project data. The link I shared has a good example but somehow I am stuck at the last step of data label. Somehow I am not able to use geom_text with proper syntax. – Akash Mar 30 '18 at 09:26
  • 1
    Can you please make a reproducible example then? The point of it is to have a dataset that we can easily use to test your code and more easily find your error. Looking at your link, I still do not know how to make a dataset to test your code with. – MBorg Mar 30 '18 at 09:32
  • Just added it. Not able to upload the Excel file. So pasted it as image. also added full code. – Akash Mar 30 '18 at 09:42
  • Can you download R on your computer and put the code into R? The premise behind reproducible example is that someone can copy and paste your data and code from your post into R and run it immediately. Even making a matrix to suit your Excel file, I cannot reproduce your data and code without getting error messages. – MBorg Mar 30 '18 at 09:48
  • Just add `dput(dataset)` to the question (after all those lines of data manipulation), please. – Axeman Mar 30 '18 at 10:00
  • I have updated the question with proper formatting and also added dataset which can be directly executed in R. I have removed the data label syntax as it was not working fine. Need your help to add data label on this code. Basically I would need conditional data label for each type. – Akash Mar 30 '18 at 13:37
  • never mind I was able to figure out the issue with my code. Thank you everyone. – Akash Mar 31 '18 at 06:03

1 Answers1

3

Here is an approach with ggplot.

First the data:

df1 <- data.frame(z = c(-417, -12, 276, -276, 787, 14),
                  b =  LETTERS[1:6])

library(tidyverse)

Calculate the cumsum and the lag of the cumsum for geom_rect coords

df1 %>%
  mutate(val = cumsum(z),
         lag = c(0, lag(val)[-1]),
         b1 = as.numeric(b)) -> df1

ggplot(df1)+
  geom_rect(aes(xmin = b1 - 0.45,
                xmax = b1 + 0.45, ymin = lag, ymax = val)) +
  geom_text(aes(x = b1, y = val, label = val), #or `label = z`
            vjust = ifelse(df1$val < df1$lag, -0.2, 1)) + #geom_text vjust depends on the direction of the value
  scale_x_continuous(breaks = 1:6, labels = df1$b)

enter image description here

an easier way, but I think the labels position can not be changed at this moment but it is planned:

rect_text_labels_anchor (character) How should rect_text_labels be positioned? In future releases, we might have support for north or south anchors, or for directed positioning (negative down, positive up) etc. For now, only centre is supported.

library(waterfalls)


df1 <- data.frame(z = c(-417, -12, 276, -276, 787, 14),
                  b =  LETTERS[1:6])

enter image description here

you could also color it the same way in ggplot:

df1 %>%
  mutate(val = cumsum(z),
         lag = c(0, lag(val)[-1]),
         b1 = as.numeric(b),
         color = ifelse(val <lag, "down", "up")) -> df1

ggplot(df1)+
  geom_rect(aes(xmin = b1 - 0.45,
                xmax = b1 + 0.45, ymin = lag, ymax = val, fill = color)) +
  geom_text(aes(x = b1, y = val, label = z),
            vjust = ifelse(df1$val < df1$lag, -0.2, 1)) +
  scale_x_continuous(breaks = 1:6, labels = df1$b)

enter image description here

EDIT: answers to the questions in comments.

Filled waterfall:

df1 <- data.frame(z = c(-417, -12, 276, -276, 787, 14),
                  b =  LETTERS[1:6],
                  group = rep(c("AB", "CD", "EF"), each = 2))

df1 %>%
  mutate(val = cumsum(z),
         lag = c(0, lag(val)[-1]),
         b1 = as.numeric(b),
         g1 = as.numeric(group)) -> df1

ggplot(df1)+
  geom_rect(aes(xmin = g1 - 0.45,
                xmax = g1 + 0.45, ymin = lag, ymax = val, fill = b)) +
  geom_text(aes(x = g1, y = val, label = z),
              vjust = ifelse(df1$val < df1$lag, -0.2, 1)) +
  scale_x_continuous(breaks = 1:3, labels = unique(df1$group))

enter image description here

To answer what went wrong with your geom_text code I would need to see it. Other then that your code works, but it over-complicates things. I advise you to learn a bit of tidyverse functions, data manipulation will be much cleaner then.

One more note, adding back-ticks:

dataset$`TotalHeadcount`

is not necessary when your column names do not contain special characters:

dataset$TotalHeadcount

EDIT2: to change to order on the x axis you would first change the levels of the grouping factor and then do the calculation and plotting:

df1 <- data.frame(z = c(-417, -12, 276, -276, 787, 14),
                  b =  LETTERS[1:6],
                  group = rep(c("AB", "CD", "EF"), each = 2))

df1 %>%
  mutate(group = factor(group, levels = c("AB", "EF", "CD"))) %>%
  arrange(group) %>%
  mutate(val = cumsum(z),
         lag = c(0, lag(val)[-1]),
         b1 = as.numeric(b),
         g1 = as.numeric(group)) -> df1

ggplot(df1)+
  geom_rect(aes(xmin = g1 - 0.45,
                xmax = g1 + 0.45, ymin = lag, ymax = val, fill = b)) +
  geom_text(aes(x = g1, y = val, label = z),
            vjust = ifelse(df1$val < df1$lag, -0.2, 1)) +
  scale_x_continuous(breaks = 1:3, labels = unique(df1$group))

enter image description here

missuse
  • 19,056
  • 3
  • 25
  • 47
  • Thank you for reply. I have edited the question with proper formatting and dataset as well. – Akash Mar 30 '18 at 13:35
  • I have tried your approach with sample dataset and it worked fine. But I also want to know what is wrong with my code. I mean something to do with syntax. If you can help me with that.. – Akash Mar 30 '18 at 18:32
  • by any chance do you know how to create stacked waterfall chart using the same concept ? Suppose A and B is grouped as AB, C and D as grouped as CD and similarly E and F as EF group. So X axis should have AD, CD, EF. And AD as a bar should have -417 and -12 stacked – Akash Mar 30 '18 at 18:47
  • Added code for stacked waterfall. To explain what went wrong with your `geom_text` code you will need to post it. – missuse Mar 31 '18 at 07:12
  • Really appreciate your help. It was a great learning. If I want to show AB, EF and then CD in the X axis ? I mean Custom ordering of the x axis group... – Akash Mar 31 '18 at 08:42
  • @Akash no problem, added example – missuse Mar 31 '18 at 10:35
  • Thank you so much. – Akash Mar 31 '18 at 13:42