1

Sample data:

df <- tibble(
 "PLAYER" = c("Corey Kluber", "CLayton Kershaw", "Max Scherzer", "Chris Sale",
           "Corey Kluber", "Jake Arrieta", "Jose Urena", "Yu Darvish"),
 "YEAR" = c(2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017),
 "WHIP" = c(1.24, 1.50, 1.70, 1.35, 1.42, 1.33, 1.61, 1.10),
 "ERA" =  c(3.27, 4.0, 2.56, 1.45, 3.87, 4.23, 3.92, 2.0)
)

The data set is much larger, but I have written a function (that does not work) to retrieve the player and desired statistic, and then output a plot using ggplot:

baseball_stats <- function(player, statistic) {

  # Libraries
  library(tidyverse)
  library(rvest)
  library(ggrepel)

  # Function to set YEAR scale to number of seasons played by pitcher
  f <- function(k) {
    step <- k
    function(y) seq(floor(min(y)), ceiling(max(y)), by = step)
   }

  # ggplot of player and chosen statistic
  p <- df %>% 
    group_by(PLAYER) %>% 
    filter(PLAYER == player) %>% 
    ggplot() +
    geom_col(aes(YEAR, statistic), width = .5) +
    scale_x_continuous(breaks = f(1)) +  # Uses the function to set YEAR breaks
    scale_y_continuous(breaks = f(0.1)) +
    theme_bw() +
  coord_flip() +
  labs(
   title = "statistic Statistic: player",
   subtitle = "statistic over seasons played",
   x = "Year",
   y = "statistic Stat",
   caption = "Data from espn.com")

  print(p)
  return(baseball_stats)

}

baseball_stats("Corey Kluber", WHIP)

I either get Error: Discrete value supplied to continuous scale or another error about $ and atomic vectors (my data set is scraped using rvest and I have to clean it up, and I tried to include that in my function). Thanks

papelr
  • 468
  • 1
  • 11
  • 42

1 Answers1

1

I got a plot after changing aes to aes_string

geom_col(aes_string("YEAR", statistic), width = .5)

Notice that YEAR is in quot. And I would call it the following command

baseball_stats("Corey Kluber", "WHIP")

Again WHIP is passed in quot too.

The complete code is here:

baseball_stats <- function(player, statistic) {

  # Libraries
  library(tidyverse)
  library(rvest)
  library(ggrepel)

  # Function to set YEAR scale to number of seasons played by pitcher
  f <- function(k) {
    step <- k
    function(y) seq(floor(min(y)), ceiling(max(y)), by = step)
  }

  # ggplot of player and chosen statistic
  p <- df %>% 
    group_by(PLAYER) %>% 
    filter(PLAYER == player) %>% 
    ggplot() +
    geom_col(aes_string("YEAR", statistic), width = .5) +
    scale_x_continuous(breaks = f(1)) +  # Uses the function to set YEAR breaks
    scale_y_continuous(breaks = f(0.1)) +
    theme_bw() +
    coord_flip() +
    labs(
      title = "statistic Statistic: player",
      subtitle = "statistic over seasons played",
      x = "Year",
      y = "statistic Stat",
      caption = "Data from espn.com")

  print(p)
  return(baseball_stats)

}

baseball_stats("Corey Kluber", "WHIP")
  • Awesome, thank you! Would there be a way to pass the function to the `labs()` part of the ggplot? Thanks! – papelr Jul 17 '18 at 03:01
  • I didn't get your question about passing `labs()`. If you need to modify the `labs()` outside the function. You can change `print(p)` to `return (p)` inside the function and store it to a variable, lets say **myplot**. then you can do `myplot + labs( title = ..., subtitle = ..., ...)`. And you can delete the `labs()` part inside the function. Also I don't know why you a returning `baseball_stats` inside the function. Let me know if it answers your question. If so, please feel free to mark the question as an answer. – see-king_of_knowledge Jul 17 '18 at 10:13
  • Should I not use `return()` at all? This question seems to be split on it: https://stackoverflow.com/questions/11738823/explicitly-calling-return-in-a-function-or-not – papelr Jul 17 '18 at 14:44
  • 1
    If you are asking in general, the question has discussed in depth in the link and personally I believe it is a matter of clarity. However in your case, `return(baseball_stats)`, it returns the function itself as an object, and I don't think that what you are intended to do. You can return (p) , where p is a `ggplot` object, explicitly by `return (p)` or implicitly by having the last line in the function ends with `p`. Once returned, you can do many ggplot operation on the returned object, such as `returned_object + lab(...` – see-king_of_knowledge Jul 17 '18 at 15:26
  • Using `return(p)` actually sped up the function / graph process by quite a bit.. I appreciate it! Thank you sir – papelr Jul 17 '18 at 15:40