I am trying to apply a function to a dataframe to add a column which calculates the percentile rank for each record based on Weather Station ID (WSID) and Season Grouping.
## temperatures data frame:
WSID Season Date Temperature
20 Summer 24/01/2020 18
12 Summer 25/01/2020 20
20 Summer 26/01/2020 25
12 Summer 27/01/2020 17
20 Winter 18/10/2020 15
12 Winter 19/10/2020 12
12 Winter 20/10/2020 13
12 Winter 21/10/2020 14
## Code tried:
perc.rank <- function(x) trunc(rank(x))/length(x)
rank.perc = function(mdf) {
combined1 = mdf %>%
mutate(percentile = perc.rank(Temperature))
}
temperatures = temperatures %>%
split(.$WSID) %>%
map_dfr(~rank.perc(.))
## Expected Output :
WSID Season Date Temperature Percentile
20 Summer 24/01/2020 18 0.333
12 Summer 25/01/2020 20 0.444
20 Summer 26/01/2020 25 0.666
12 Summer 27/01/2020 17 0.333
20 Winter 18/10/2020 15
12 Winter 19/10/2020 12
12 Winter 20/10/2020 13
12 Winter 21/10/2020 14
Is there some elegant way to do this using functions such as group_modify, group_split, map and/or split? I was thinking there should be as for example in case there is a 3 or more level grouping factor.
The code works for when I split the data by WSID but I cant seem to get any further when I want to group also by WSID + Season.
(Filled in Percentile values were calculated from Excel percentile rank function)