0

I am trying to apply a function to a dataframe to add a column which calculates the percentile rank for each record based on Weather Station ID (WSID) and Season Grouping.

## temperatures data frame:

WSID    Season  Date    Temperature
20  Summer  24/01/2020  18
12  Summer  25/01/2020  20
20  Summer  26/01/2020  25
12  Summer  27/01/2020  17
20  Winter  18/10/2020  15
12  Winter  19/10/2020  12
12  Winter  20/10/2020  13
12  Winter  21/10/2020  14

## Code tried:
perc.rank <- function(x) trunc(rank(x))/length(x)

rank.perc = function(mdf) {
  combined1 = mdf %>%
  mutate(percentile = perc.rank(Temperature))
}

temperatures = temperatures %>%
  split(.$WSID) %>%
  map_dfr(~rank.perc(.))

## Expected Output :

WSID    Season  Date    Temperature Percentile
20  Summer  24/01/2020  18  0.333
12  Summer  25/01/2020  20  0.444
20  Summer  26/01/2020  25  0.666
12  Summer  27/01/2020  17  0.333
20  Winter  18/10/2020  15  
12  Winter  19/10/2020  12  
12  Winter  20/10/2020  13  
12  Winter  21/10/2020  14  


Is there some elegant way to do this using functions such as group_modify, group_split, map and/or split? I was thinking there should be as for example in case there is a 3 or more level grouping factor.

The code works for when I split the data by WSID but I cant seem to get any further when I want to group also by WSID + Season.

(Filled in Percentile values were calculated from Excel percentile rank function)

csharpvsto
  • 89
  • 1
  • 8
  • 2
    https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question – StupidWolf Feb 05 '21 at 07:54
  • Sorry it's my first post on Stack Overflow so I'm just getting used to the syntax. I have updated my post now. Hopefully it is more clear and easier to understand now. – csharpvsto Feb 05 '21 at 08:18

1 Answers1

0

You can directly use the function with group_by instead of splitting, also function rank.perc seems unnecessary.

library(dplyr)

perc.rank <- function(x) trunc(rank(x))/length(x)

df %>%
  group_by(WSID) %>%
  mutate(percentile = perc.rank(Temperature))

In group_by it is easy to add more groupings later eg - group_by(WSID, Season).

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213