How to Sort and provide top 5 values for each month in R

Question

I have created a dataframe which has three columns Name, Month and Amount . The format is such that there are mutiple names in each month and each combination has an amount . I want to find the top 5 users based on their monthly spending. Which means the final data in the data frame will have only top 5 earnings for each month . The way i have calculated the data now now is **

Extract_Month<- months(Credit$Transaction.Date)
Extract_Month
TopSpend<-aggregate(Credit$Amount, 
                    by=list(Credit$User,Extract_Month)
                    , FUN=mean)

** I am stuck beyond this point . Please help

Here is some sample data

User<-c(6,2,3,4,5,6)
Transaction.Date<-c("11-1-2019","11-2-2019","11-3-2019",
"12-1-2019","12-2-2019","11-1-2019")
Amount<-c(100,200,300,400,500,150)

Credit<-data.frame(User,Transaction.Date,Amount)

This is used to Find AVg of Amounts on the basis of month and user — S_Gupta, Jan 18 '19 at 20:32
Yes, I meant why are you creating a separate variable `Extract_Month` outside of the data frame and then trying to sort using it? Why not use `Credit$Transaction.Date` inside the by list? — Chabo, Jan 18 '19 at 20:35

score 1 · Answer 1 · edited Jan 19 '19 at 01:27

1

Here is a solution:

 library(tidyverse)
 df<-data.frame(Name=c("A","B","C"),Month=as.factor(c(11,11,11)),Amount=c(123,456,789))
 df %>% 
 arrange(desc(Amount)) %>% 
 top_n(2,Amount)#change 2 to 5

Best to provide sample data:

iris %>% 
  group_by(Species) %>% 
  arrange(desc(Sepal.Length)) %>% 
  top_n(5,Sepal.Length)

OR:: Based on @Chabo 's data:

User<-c(6,2,3,4,5,6)
Transaction.Date<-c("11-1-2019","11-2-2019","11-3-2019",
                    "12-1-2019","12-2-2019","11-1-2019")
Amount<-c(100,200,300,400,500,150)
df1<-data.frame(Amount,Transaction.Date,User)
df1 %>% 
  group_by(User,Transaction.Date) %>% 
  arrange(desc(Amount)) %>% 
  top_n(5,Amount) %>% 
  ungroup() %>% 
  top_n(5,Amount)

edited Jan 19 '19 at 01:27

Chabo

2,842
3
17
32

answered Jan 18 '19 at 20:34

NelsonGon

13,015
7
27
57

Error in arrange_impl(.data, dots) : Evaluation error: `as_dictionary()` is defunct as of rlang 0.3.0. Please use `as_data_pronoun()` instead. – S_Gupta Jan 18 '19 at 20:40
What code are you using? Reinstall tidyverse Could you also add sample data to your question to avoid us making stuff up? https://stackoverflow.com/questions/52957136/defunct-as-of-rlang-0-3-0-and-mutate-impl – NelsonGon Jan 18 '19 at 20:41
Error in TopSpend %>% group_by(Group.1, Group.2) %>% arrange(desc(x)) %>% : could not find function "%>%" – S_Gupta Jan 18 '19 at 20:47
TopSpend %>% group_by(Group.1,Group.2) %>% arrange(desc(x)) %>% top_n(5,x) – S_Gupta Jan 18 '19 at 20:47
Did you call `library(tidyverse)`? What is x? Use `dput` to add data to your question. – NelsonGon Jan 18 '19 at 20:48
Yes . I called it . Reinstalled the package and restarted the Studio as well – S_Gupta Jan 18 '19 at 20:49
Hmmm...use `dplyr` directly instead. `library(dplyr)` – NelsonGon Jan 18 '19 at 20:50
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘Rcpp’ – S_Gupta Jan 18 '19 at 20:53
`install.packages("Rcpp",dependencies=T)` or uninstall and reinstall the tidyverse with dep set to T. – NelsonGon Jan 18 '19 at 20:55

Chabo · Answer 2 · 2019-01-18T21:57:51.037

Using made up data over multiple months. May not be the best approach but it works. I would recommend working with @NelsonGon on the tidyverse approach.

Data Creation:

library(dplyr)

User<-c(6,2,3,4,5,6)
Transaction.Date<-c("11-1-2019","11-2-2019","11-3-2019",
"12-1-2019","12-2-2019","11-1-2019")
Amount<-c(100,200,300,400,500,150)

Credit<-data.frame(User,Transaction.Date,Amount)

Aggregate, Arrange and Subset:

#Aggregate user by avg amount spent and date
TopSpend<-aggregate(Credit$Amount, 
                by=list(Credit$User,Credit$Transaction.Date)
                , FUN=mean)

#Reverse so high in the start                    
TopSpend<-arrange(TopSpend, rev(rownames(TopSpend)))
                    print(TopSpend)

#Rename for clarity                
names(TopSpend)<-c("User", "Date","Amount")

#Format date for split              
TopSpend$Date<-as.POSIXct(TopSpend$Date, format="%m-%d-%Y")

#Split based on month             
TopSpend_Fin<-split(TopSpend, format(TopSpend$Date, "%Y-%m"))

#Get first 5 elements (non-existent won't throw error)
TopSpend_Fin<-lapply(TopSpend_Fin, head, n = 5L)

$`2019-11`
  User       Date Amount
3    3 2019-11-03    300
4    2 2019-11-02    200
5    6 2019-11-01    125

$`2019-12`
  User       Date Amount
1    5 2019-12-02    500
2    4 2019-12-01    400

what about sorting in Decreasing order and then getting top 5? — S_Gupta, Jan 19 '19 at 05:56
@StutiGupta The program already sorts in decreasing order, and the top 5 is already being pulled. In the example there are not 5 total options per month so it pulls as many as it can, in decreasing order. If you want increasing order, delete `TopSpend<-arrange(TopSpend, rev(rownames(TopSpend)))` as this reverses the default which is increasing. — Chabo, Jan 22 '19 at 15:21

How to Sort and provide top 5 values for each month in R

2 Answers2