3

I want to transform my dataframe into a format that would be suitable for a waterfall chart.

My dataframe is as follows:

employee <- c('A','B','C','D','E','F', 
              'A','B','C','D','E','F',
              'A','B','C','D','E','F',
              'A','B','C','D','E','F')
revenue <- c(10, 20, 30, 40, 10, 40, 
              8, 10, 20, 50, 20, 10,
              2,  5, 70, 30, 10, 50,
             40,  8, 30, 40, 10, 40)
date <- as.Date(c('2017-03-01','2017-03-01','2017-03-01',
                  '2017-03-01','2017-03-01','2017-03-01',
                  '2017-03-02','2017-03-02','2017-03-02',
                  '2017-03-02','2017-03-02','2017-03-02',
                  '2017-03-03','2017-03-03','2017-03-03',
                  '2017-03-03','2017-03-03','2017-03-03',
                  '2017-03-04','2017-03-04','2017-03-04',
                  '2017-03-04','2017-03-04','2017-03-04'))
df <- data.frame(date,employee,revenue)

         date employee revenue
1  2017-03-01        A      10
2  2017-03-01        B      20
3  2017-03-01        C      30
4  2017-03-01        D      40
5  2017-03-01        E      10
6  2017-03-01        F      40
7  2017-03-02        A       8
8  2017-03-02        B      10
9  2017-03-02        C      20
10 2017-03-02        D      50
11 2017-03-02        E      20
12 2017-03-02        F      10
13 2017-03-03        A       2
14 2017-03-03        B       5
15 2017-03-03        C      70
16 2017-03-03        D      30
17 2017-03-03        E      10
18 2017-03-03        F      50
19 2017-03-04        A      40
20 2017-03-04        B       8
21 2017-03-04        C      30
22 2017-03-04        D      40
23 2017-03-04        E      10
24 2017-03-04        F      40

How do I transform this dataframe so that I can get it into a form for a waterfall chart in ggplot2?

The amount column is the difference from the total day by employee.

The end column is the start column minus the amount column.

The start column is the Total end values from previous day.

Final dataframe should look like this:

         date employee     start    end    amount    total_for_day
1  2017-03-01        A         0     10        10               10
2  2017-03-01        B         0     20        20               20
3  2017-03-01        C         0     30        30               30
4  2017-03-01        D         0     40        40               40
5  2017-03-01        E         0     10        10               10
6  2017-03-01        F         0     40        40               40
7  2017-03-01    Total         0    150       150              150
8  2017-03-02        A       150    148        -2                8
9  2017-03-02        B       150    140       -10               10
10 2017-03-02        C       150    140       -10               20
11 2017-03-02        D       150    160        10               50 
12 2017-03-02        E       150    160        10               20
13 2017-03-02        F       150    120       -30               10  
14 2017-03-02    Total       150    118       -32               98
15 2017-03-03        A       118    112        -6                2                      
16 2017-03-03        B       118    113        -5                5                  
17 2017-03-03        C       118    168        50               70
18 2017-03-03        D       118     98       -20               30  
19 2017-03-03        E       118    108       -10               10  
20 2017-03-03        F       118    158        40               50
21 2017-03-03    Total       118    167        49              170  
22 2017-03-04        A       167    205        38               40
23 2017-03-04        B       167    170         3                8
24 2017-03-04        C       167    127       -40               30
25 2017-03-04        D       167    177        10               40
26 2017-03-04        E       167    167         0               10
27 2017-03-04        F       167    157       -10               40 
28 2017-03-04    Total       167    168         1              168
Adam Smith
  • 2,584
  • 2
  • 20
  • 34
nak5120
  • 4,089
  • 4
  • 35
  • 94
  • Can you explain a bit more what a waterfall chart is, and which code in ggplot one have to use? – Roman Mar 27 '17 at 16:25
  • Sure, I'm basing it off of this tutorial: https://learnr.wordpress.com/2010/05/10/ggplot2-waterfall-charts/ – nak5120 Mar 27 '17 at 16:26
  • So I want to utilize this concept to have each day be the x-axis, the y-axis will be the amount, and I want to include each partner for each day with the total. – nak5120 Mar 27 '17 at 16:26
  • Please add what you have tried so far regarding the plotting. The question is how your expected output will be helpfull for fitting into ggplot? The total and the amount column can be calculated with this: `Total <- aggregate(revenue ~ date, df, sum); End <- lapply(split(df, df$employee), function(x) diff(x$revenue)); c(df$revenue[ df$date == df$date[1]], c(do.call(rbind, End)))` – Roman Mar 27 '17 at 16:33
  • I would have to add an id column but it would be something along these lines: `ggplot(newdf, aes(date, fill = employee)) + geom_rect(aes(x = date, + xmin = id - 0.45, xmax = id + 0.45, ymin = end, + ymax = start))` – nak5120 Mar 27 '17 at 16:47

1 Answers1

6

There are a few steps to get you to this, and I think that the dplyr package will help (used heavily below).

My understanding is that revenue gives the cumulative total revenue, rather than the daily change. If that is wrong, you would need to reverse some of these calculations.

The first step is to create a new data.frame that calculates the daily totals, then bind that back to the data.frame. Then, you can group_by the employees (including "Total") and add columns that will be created separately for each employee (value on the previous day, the change, and then whether it was an increase or a decrease).

toPlot <-
  bind_rows(
    df
    , df %>%
      group_by(date) %>%
      summarise(revenue = sum(revenue)) %>%
      mutate(employee = "Total") 
  ) %>%
  group_by(employee) %>%
  mutate(
    previousDay = lag(revenue, default = 0) 
    , change = revenue - previousDay
    , direction = ifelse(change > 0
                         , "Positive"
                         , "Negative"))

returns:

         date employee revenue previousDay change direction
       <date>    <chr>   <dbl>       <dbl>  <dbl>     <chr>
1  2017-03-01        A      10           0     10  Positive
2  2017-03-01        B      20           0     20  Positive
3  2017-03-01        C      30           0     30  Positive
4  2017-03-01        D      40           0     40  Positive
5  2017-03-01        E      10           0     10  Positive
6  2017-03-01        F      40           0     40  Positive
7  2017-03-02        A       8          10     -2  Negative
8  2017-03-02        B      10          20    -10  Negative
9  2017-03-02        C      20          30    -10  Negative
10 2017-03-02        D      50          40     10  Positive
# ... with 18 more rows

Then, we can plot that using:

toPlot %>%
  ggplot(aes(xmin = date - 0.5
             , xmax = date + 0.5
             , ymin = previousDay
             , ymax = revenue
             , fill = direction)) +
  geom_rect(col = "black"
            , show.legend = FALSE) +
  facet_wrap(~employee
             , scale = "free_y") +
  scale_fill_brewer(palette = "Set1")

to give

enter image description here

Note that including "Total" throws off the scale (requiring the free scales), so I would prefer to omit it:

toPlot %>%
  filter(employee != "Total") %>%
  ggplot(aes(xmin = date - 0.5
             , xmax = date + 0.5
             , ymin = previousDay
             , ymax = revenue
             , fill = direction)) +
  geom_rect(col = "black"
            , show.legend = FALSE) +
  facet_wrap(~employee) +
  scale_fill_brewer(palette = "Set1")

For this to allow direct comparsion between employees

enter image description here

and this for the overall total

toPlot %>%
  filter(employee == "Total") %>%
  ggplot(aes(xmin = date - 0.5
             , xmax = date + 0.5
             , ymin = previousDay
             , ymax = revenue
             , fill = direction)) +
  geom_rect(col = "black"
            , show.legend = FALSE) +
  scale_fill_brewer(palette = "Set1")

enter image description here

though I still find line graphs to be easier to interpret (especially comparing employees):

toPlot %>%
  filter(employee != "Total") %>%
  ggplot(aes(x = date
             , y = revenue
             , col = employee)) +
  geom_line() +
  scale_fill_brewer(palette = "Dark2")

enter image description here

If you want to plot the changes themselves by day, you can do:

toPlot %>%
  filter(employee != "Total") %>%
  ggplot(aes(x = date
             , y = change
             , fill = employee)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Dark2")

to get:

enter image description here

but now you are getting rather far from the "waterfall" plot outputs. If you really, really want to make a waterfall comparable across plots you can, but it is going to be rather ugly (I'd strongly recommend the line plot above instead).

Here, you need to manually move the boxes around, and this will require some tinkering if you change the output aspect ratio (or size) or the number of employees. You also need to include colors for both the employee and the direction of the change, which starts to look rough. This falls into the category of "can, but probably shouldn't" -- there is likely a better way to display these data.

toPlot %>%
  filter(employee != "Total") %>%
  ungroup() %>%
  mutate(empNumber = as.numeric(as.factor(employee))) %>%
  ggplot(aes(xmin = (empNumber) - 0.4
             , xmax = (empNumber) + 0.4
             , ymin = previousDay
             , ymax = revenue
             , col = direction
             , fill = employee)) +
  geom_rect(size = 1.5) +
  facet_grid(~date) +
  scale_fill_brewer(palette = "Dark2") +
  theme(axis.text.x = element_blank()
        , axis.ticks.x = element_blank())

gives

enter image description here

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • this is fantastic thank you. Is it also possible to change the lattice graph all into one graph so that it is easier to see the comparisons by partner? Basically a double-bar graph for the changes (7 bars per day)? – nak5120 Mar 27 '17 at 17:16
  • @NickKnauer: Can? Yes (see edit). Should? ... probably not. – Mark Peterson Mar 27 '17 at 18:20