1

I have some athlete physical activity data and am plotting some current data against historical data. The two datasets I'm using are below.

Historical data - quartertwo2017

``` r
Player.Name Date    Distance  HIR      V6
Player 1    10/9/17 7060.621  2506.20  12.50
Player 1    15/7/17 4978.625  1596.19  44.26
Player 1    2/7/17  6787.667  2048.61  39.67
Player 1    22/7/17 6881.126  2065.80  31.48
Player 1    24/6/17 5802.060  2204.87  65.48
Player 1    29/7/17 7035.075  2085.32  22.56
Player 1    3/9/17  7016.175  2659.18  66.14
Player 1    5/8/17  6137.929  2154.36  25.49
Player 1    9/6/17  5515.685  2054.66  189.55
Player 1    9/7/17  6311.515  2144.63  20.54
Player 2    1/4/17  7150.221  2307.78  233.88
Player 2    10/9/17 8115.131  3136.33  217.86
Player 2    13/5/17 6391.008  2325.89  101.85
Player 2    15/7/17 6919.630  2136.40  118.64
Player 2    17/6/17 6366.357  2177.28  189.09
Player 2    19/8/17 7230.393  2530.59  104.58
Player 2    2/7/17  6620.122  1908.88  36.34
Player 2    20/5/17 7335.201  2250.34  152.84
Player 2    22/4/17 6956.030  2483.05  376.06
Player 2    22/7/17 7643.874  2370.89  172.20
Player 2    24/3/17 4258.366  1447.50  195.18
Player 2    24/6/17 7305.026  2771.67  297.99
Player 2    26/8/17 8024.780  2867.62  318.08
Player 2    27/5/17 6714.186  2409.16  125.31
Player 2    28/4/17 7106.519  2832.97  337.05
Player 2    29/7/17 8693.820  1961.28  27.80
Player 2    3/9/17  8005.006  2741.90  139.24
Player 2    5/8/17  7676.653  2475.58  111.07
Player 2    9/6/17  7176.619  2645.06  137.82
Player 2    9/7/17  7946.231  3140.44  126.59
#> Error: <text>:1:16: unexpected symbol
#> 1: Player.Name    Date
#>                    ^
```

Current data - quartertwo2018

``` r
Player.Name   Date   Distance     HIR      V6
Player 1      2/3/18 5234.390     1513.73  41.82
Player 2      2/3/18 6352.987     2054.94  166.72
#> Error: <text>:1:15: unexpected symbol
#> 1: Player.Name   Date
#>                   ^
```

Specifically, I'm plotting the current total distance covered by an athlete using geom_point against the distance they typically cover using geom_boxplot. The code I have so far reads as follows:

plot_TD_Q2 <- ggplot(data = quartertwo2017, aes(x = Player.Name, y = Distance)) +
  geom_boxplot(fill = "light blue") +
  coord_flip() +
  ggtitle("Quarter 2") +
  xlab("Player") +
  ylab("Total Distance") +
  theme_classic()

plot_TD_Q2 <- plot_TD_Q2 + geom_point(data = quartertwo2018, aes(x = Player.Name, y = Distance),
  position = position_jitter(width = 0.5),
  col = "red",
  cex = 3)

The output this code brings I'm really happy with. However, I'm wondering whether it is possible to alter the colour of the boxplot based on a z-score calculation.

For example, I'd like the colour of the boxplot to go red if an athlete's 'current' total distance (geom_point) is (>)3 SDs away from their mean historical data. Additionally, if an athlete's current total distance falls between 1 and 2.99 SDs the boxplot will change to amber, and if it falls within 1 SD, it will be filled green.

My historical data is pulled from the dataset quartertwo2017, while my 'current' data is quartertwo2018. So, x = current total distance drawn from quartertwo2018 versus the mean and SD of quartertwo2017.

I hope my question makes sense. Understand this may be a little advanced, especially as I still consider myself a novice in R. Would appreciate any help, and please let me know if more information is required. I'm new to posting on Stack Overflow, so hope I have compiled this question correctly.

Thank you.

  • 1
    Please add data to your question – pogibas Mar 17 '18 at 11:10
  • as @PoGibas noted, it is difficult for us to help you with your question without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). If we had one, we might be able to help. :) – bouncyball Mar 17 '18 at 14:19
  • @PoGibas, I have included my data. I hope this is done correctly and of assistance. New to SO, so still learning what is required to format a question optimally. :-) –  Mar 18 '18 at 00:09

1 Answers1

0

Consider calculating your z_score by aggregating historical data for sd and then merge to current data and assign a new column conditionally with ifelse. This new column can then be used in the colour argement of aes():

aggdf <- setNames(aggregate(Distance~Player.Name, quartertwo2017, sd),
                  c("Player.Name", "Distance_sd"))

quartertwo2018 <- merge(quartertwo2018, aggdf, by="Player.Name")

quartertwo2018$z_score <- ifelse(quartertwo2018$Distance > (3*sd(quartertwo2018$Distance_sd)),
                                 'high',
                                 ifelse(quartertwo2018$Distance < (3*sd(quartertwo2018$Distance_sd))
                                        & quartertwo2018$Distance > (1*sd(quartertwo2018$Distance_sd)),
                                        'med',
                                        'low'))

plot_TD_Q2 <- ggplot(data = quartertwo2017, 
                     aes(x = Player.Name, y = Distance)) +
  geom_boxplot(fill = "light blue") +
  coord_flip() + ggtitle("Quarter 2") +
  xlab("Player") + ylab("Total Distance") + theme_classic() +

  geom_point(data = quartertwo2018, 
             aes(x = Player.Name, y = Distance, colour = z_score),
             position = position_jitter(width = 0.5),
            cex = 3) +
  # RED, ORANGE/RED, GREEN BY HEX COLOR CODE
  scale_color_manual(values=c("#FF0000", "#FF6600", "#339900")) 

plot_TD_Q2

Output (looks similar to yours since both Players in posted data share the red category)

Plot Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • This is terrific, thanks. If I only wanted to 'flag' red for + 3 SDs, instead of - 3 SDs as well, how would the 'ifelse' code be altered? –  Mar 19 '18 at 03:51
  • Not sure I understand. Red points ("high") are for values greater then 3 SD. See `>` operator. – Parfait Mar 19 '18 at 13:46