I have a dataset that has data from all football (Soccer) players in the top 5 leagues and I am trying to build a scout function that retrieves a shortlist of players that are in the top 85th percentile of the chosen metrics.
I tried experimenting with the function with a simple argument to see if it was working:
scout(Total_Big_5_new,"Nutmegs")
but it returns this error:
the condition has length > 1
In addition: Warning message:
In percentile(database$metric) : NAs introduced by coercion
The code for the scout function is here:
scout <- function(database, ...) {
l <- list(...)
l2 <- list()
j <- 1
for(metric in l){
if(metric %in% colnames(database)){
l2[[j]] <- percentile(database[[metric]])
j <- j + 1
}else{
print(paste("The stat", metric, "is not recorded"))
}
}
i <- 1
k <- 1
shortlist <- list()
for (player in database){
compared <- select(database, unlist(l))
if (all(compared) > all(unlist(l2))){
shortlist[[i]] <- player
i <- i + 1
}
}
return(shortlist)
}
and the percentile function:
percentile <- function(metric, value = 0.85) {
answer <- unname(quantile(metric, c(value)))
return(as.numeric(paste(answer)))
}
Edit: For example, say I make a dataframe with random data
df <- as_tibble(data.frame(
Player = c(LETTERS[1:13]),
Goals = c(sample(1:45, 13, replace=FALSE)),
Assists = c(sample(1:31, 13, replace=FALSE)),
Nutmegs = c(sample(1:28, 13, replace = FALSE)),
Dribbles = c(sample(43:208, 13, replace = FALSE))
))
Which returns this df:
Player Goals Assists Nutmegs Dribbles
<chr> <int> <int> <int> <int>
1 A 23 16 1 125
2 B 7 2 19 195
3 C 21 4 28 142
4 D 28 19 23 112
5 E 8 27 26 152
6 F 17 23 16 45
7 G 30 6 25 206
8 H 26 24 8 136
9 I 18 3 27 99
10 J 31 25 7 198
11 K 4 21 13 82
12 L 1 13 22 66
13 M 43 7 4 194
In this data frame, my percentile function would return 25.4. As seen below
percentile(df$Goals, 0.65) = 25.4
The aim of the scout function that I am creating is to retrieve the name of the players that exceed that value. EG
scout(df,"Goals")
should return players: D, G, H, J and M