0

I have a function that is meant to return the top 10 customers in a DF, however, the output of my function has the unaltered DF (asides from groupings) with results out of order and also displaying every single row, despite the fact that I asked for top 10.

myData <- data.frame("Platform" =
 c("Digital", "Digital", "Digital", "Digital", "Digital", "Digital", "Offline", "Offline", "Offline", "Offline", "Offline"), "Cust.ID" = c("John", "Josh", "Juan", "Jason", "Jay", "Jorge", "Jurgen", "Julian", "Jules", "James", "Jimmy"),
"Input.Records" = c(100, 150, 102, 10, 111, 132, 125, 154, 101, 103, 100)) 

getTopTenCustomers <- function(platformFilter = "Digital"){
  filteredDf <- myData %>% group_by(Cust.ID, Platform) %>%
    filter(Platform == platformFilter) %>% summarise(Input.Records = sum(Input.Records)) %>%
    top_n(10) %>% arrange(myData, desc(Input.Records))
  data.frame(filteredDf)
  return(filteredDf)
}

Also, I'm a bit concerned about that arrange(myData, desc(Input.Records)) call. Isn't that just arranging the rows in the original DF? Every other example I saw had the arrange at first, but that also seems concerning because, the way it seems, sorting the values before grouping and filtering might result in the filter displaying inaccurate results, wouldn't it?

  • 1
    A reproducible example along with expected output would help but this `arrange(myData, desc(Input.Records))` doesn't seem correct to be used in pipe. – Ronak Shah Sep 02 '20 at 01:12
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Sep 02 '20 at 01:12
  • I've added code to create a mock df. As for expected output, I really just want it to be sorted and display only the top 10 entries. –  Sep 02 '20 at 01:24
  • 1
    Try removing `myData` from `arrange` i.e make it `arrange(desc(Input.Records))`. – Ronak Shah Sep 02 '20 at 01:26
  • I am getting an arrangement now, but it's still showing all of my rows rather than just the top 10. –  Sep 02 '20 at 01:33

1 Answers1

2
  • When you are taking top_n your data is still grouped by Cust.ID, so we may need to ungroup the data before taking top_n.

  • No need to pass dataframe in arrange when using pipes.

  • It is better to pass dataframe explicitly in the function.

Try using this function :

library(dplyr)

getTopTenCustomers <- function(myData, platformFilter = "Digital"){
  filteredDf <- myData %>% 
                 group_by(Cust.ID, Platform) %>%
                 filter(Platform == platformFilter) %>% 
                 summarise(Input.Records = sum(Input.Records)) %>%
                 ungroup %>%
                 top_n(10) %>% 
                 arrange(desc(Input.Records))
  return(filteredDf)
}

getTopTenCustomers(myData)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213