Main question: can you find a cleaner way to compare multiple observations to one single value in a long-oriented table?
The objective is to create a waterfall chart that displays votes for a party ("A") and compare it to competitors.
The input is a list (tibble) containing three columns: neighbourhood
, party
, and pct_votes
. Example:
prop.long
# A tibble: 304 x 3
neighbourhood party pct_votes
<fct> <chr> <dbl>
1 Region-A A 0.0938
2 Region-A B 0.0672
3 Region-A C 0.0906
4 Region-A D 0.228
5 Region-A E 0.0318
6 Region-B A 0.0932
7 Region-B B 0.118
8 Region-B C 0.0837
9 Region-B D 0.199
10 Region-B E 0.0544
To display whether party A was doing better or worse than competitors the direction
attribute is required. The y-variables will be used to display the bars. If a party did worse than A, y_min
should be set to the party's value, otherwise, party A's pct_votes
value should be taken as y_min
. y_max
should be set party A's pct_votes
value if it performed worse, and otherwise take their own pct_votes
value. The x-variables are used to display the political parties side-by-side in a particular order in the plot.
This is the desired output:
prop.wf
# A tibble: 76 x 10
# Groups: neighbourhood [19]
neighbourhood party pct_votes tmp direction y_min y_max x_min x_max
<fct> <fct> <dbl> <dbl> <chr> <int> <int> <int> <int>
1 Region-A A 0.0938 0.0938 target 0 9 0 1
2 Region-A B 0.0672 0.0938 lower 6 9 1 2
3 Region-A C 0.0906 0.0938 lower 9 9 3 4
4 Region-A D 0.228 0.0938 higher 9 22 4 5
5 Region-B A 0.0932 0.0932 target 0 9 0 1
6 Region-B B 0.118 0.0932 higher 9 11 1 2
7 Region-B C 0.0837 0.0932 lower 8 9 3 4
8 Region-B D 0.199 0.0932 higher 9 19 4 5
# … with 68 more rows
The code that produced the output as desired:
prop.wf <- prop.long %>%
filter(party %in% c('A', 'B', 'C', 'D')) %>%
group_by(neighbourhood) %>%
mutate(tmp = pct_votes[party == 'A']) %>%
mutate(party = factor(party, levels = c('A', 'B', 'C', 'D')),
direction = ifelse(party == 'A', 'target', ifelse(pct_votes > tmp, 'higher', ifelse(pct_votes < tmp, 'lower', 'equal'))),
y_min = as.integer((ifelse(party == 'A', 0, ifelse(direction == 'lower', pct_votes, tmp)) * 100)),
y_max = as.integer((ifelse(party == 'A', pct_votes, ifelse(direction == 'lower', tmp, pct_votes)) * 100)),
x_min = as.integer(ifelse(party == 'A', 0, ifelse(party == 'B', 1, ifelse(party == 'C', 2, ifelse(party == 'D', 3, 4))))),
x_max = as.integer(ifelse(party == 'A', 1, ifelse(party == 'B', 2, ifelse(party == 'C', 3, ifelse(party == 'D', 4, 5)))))) # `x_min + 1` did not yield int, even after casting with `as.integer()
My main question is: can you help me refactor this into more clear/scalable code? (E.g. what if two parties need to be added? Preferable these ifelse()
statements are not chained.) I could not stop thinking: "there should be a way easier way to formulate this", but I could not come up with it.
Example of my final output (using ggplot's geom_rect
):
Leveraged resources: