-3

My objective is to create a user-defined function which will take data frame, month or year as x and product category as y and return me a data frame having with top 10 customers group by the city.

I don't want to pass city as an argument.

 toptencust <- function(df,x,y){
  library(magrittr)
  library(dplyr)

  ifelse(is.character(x)
    , df %>% 
      select_(City,Amount,Customer,Product,Year,month) %>%
      group_by_(City,Customer) %>%
      filter_(month==x & Product==y) %>% 
      summarise_(Tot_repay=sum(Amount,na.rm=T)) %>% 
      top_n(n=10)
    , df %>% 
      select_(City,Amount,Customer,Product,Year,month) %>%
      group_by_(City,Customer) %>%filter_(Year==x& Product==y) %>%
      summarise_(Tot_repay=sum(Amount,na.rm=T)) %>% 
      top_n(n=10)
    )

}

My Dataset look like as

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Customer    Date        Amount  month     City        Product  Year
A1          12/01/04    495415  January   BANGALORE   Gold     2004
A1          03/01/04    245899  January   BANGALORE   Gold     2004
A1          15/01/04    259490  January   BANGALORE   Gold     2004
A1          25/01/04    437555  January   BANGALORE   Gold     2004
A1          17/01/05    165973  January   BANGALORE   Gold     2005
A1          23/02/05    365367  February  BANGALORE   Gold     2005
A1          01/02/05    14473   February  BANGALORE   Gold     2005
A8          05/02/04    100002  February  PATNA       Silver   2004
A9          28/02/05    100003  February  CHENNAI     Silver   2005
A10         16/02/05    48759   February  CALCUTTA    Gold     2005
A11         23/02/05    208318  February  COCHIN      Gold     2005
A12         03/02/05    150281  February  BOMBAY      Gold     2005
A13         04/02/06    339078  February  BANGALORE   Gold     2006
A14         25/03/06    137835  March     BANGALORE   Gold     2006
A15         31/03/06    437120  March     CALCUTTA    Gold     2006
A16         23/03/06    103924  March     COCHIN      Gold     2006
A17         19/03/04    408467  March     BOMBAY      Gold     2004
A18         05/03/06    100000  March     BANGALORE   Silver   2006
A19         04/04/05    10000   April     BANGALORE   Platinum 2005
A20         30/04/06    10001   April     CALCUTTA    Platinum 2006
A21         25/04/04    10002   April     COCHIN      Platinum 2004
A22         19/04/06    100000  April     BOMBAY      Silver   2006
A23         06/04/04    80346   April     BANGALORE   Silver   2004
A24         27/04/05    100002  April     DELHI       Silver   2005
A25         05/05/04    100003  May       COCHIN      Silver   2004
A26         06/05/06    470982  May       PATNA       Gold     2006
A27         07/05/05    357376  May       CHENNAI     Gold     2005
A28         08/05/06    326050  May       TRIVANDRUM  Gold     2006
A29         09/05/05    215083  May       CALCUTTA    Gold     2005
A30         10/05/06    481343  May       BANGALORE   Gold     2006")

My objective is to get the output as below

Output required

when I run this function, I am getting an error as below:

toptencust(df,'February',2014)

Error in sum(Amount, na.rm = T) : invalid 'type' (symbol) of argument

I am unable to understand the problem, please help?

akraf
  • 2,965
  • 20
  • 44
  • 1
    Not a clue. What is in `consolidate`? I suggest you read about [reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), then edit your question. – r2evans Jun 07 '18 at 07:39
  • 2
    But some hints: (1) don't use `if_else` here, just `if {...} else {...}`, this is a horribly wrong/inefficient use of a vectorizing conditional function; (2) don't put your entire `magrittr` pipe on one line, it makes reading and debugging much more difficult. – r2evans Jun 07 '18 at 07:41
  • hi r2evans thanks for replying consolidate is a data frame – Anubhav Kukareti Jun 07 '18 at 07:54
  • 1
    PLEASE read the link I provided in my first comment. It is relatively apparent that it is a frame or something frame-like, ergo your use in `dplyr`-pipes. If you want somebody to be able to troubleshoot what is going on, you need to provide more. It would also be good to reduce the problem, as I suspect we don't need to deal with all of those columns to be able to resolve your issue. Another good reference: [minimal, verifiable examples](https://stackoverflow.com/help/mcve). – r2evans Jun 07 '18 at 07:58
  • thanks for sharing and I read it and edited my question to make others understand the problem and included dataset and desired output as an image. – Anubhav Kukareti Jun 07 '18 at 08:37
  • 2
    The code doesn't work for me because you provided a png but `dplyr` functions require a `data.frame`. (It may be *just as easy* for you to copy the output of `dput(head(mydataset))` as it is to do the screenshot thing, BUT it is 100x easier for us to test your data when we can just copy/paste it into an R session. I am not going to transcribe it.) – r2evans Jun 07 '18 at 14:02
  • hi Evans this is my first post hence I am struggling a bit. I somehow managed to insert sample data set and output required. – Anubhav Kukareti Jun 08 '18 at 06:01
  • 2
    Anubhav, twice now I've suggested you read the [*reproducible*](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), and you are closer but you did none of the recommended methods. They suggest (very clearly) to provide your data in an *easily consumed format*, with two great examples being `dput(head(x))` and `read.table(text='...')`. To prove my point, try to quickly and easily read the data that you posted into a data.frame, and now do the same thing with what I just edited your question (using `read.table`). It is much easier this way. – r2evans Jun 08 '18 at 13:42
  • 1
    Another problem: your code doesn't work, and it errors in ways different than what you suggest. Further, your filtering with `"February"` and `2014` produce no results given your sample data. Your function has the same code on both side of the `ifelse` conditional (which should be `if {...} else {...}` if you really need a conditional), and is mis-using the `select_` and other std-eval forms of the functions. When posting questions, it can be really informative (and helpful to us) if you start a fresh R session and try the data/code you've given us ... you will find it frustrating, too. – r2evans Jun 08 '18 at 13:50
  • 1
    All in all, though, what is wrong with `df %>% group_by(City, Customer) %>% filter(month == "February", Year == 2004) %>% summarize(Tot_repay = sum(Amount))`? – r2evans Jun 08 '18 at 13:50

1 Answers1

0

Executing your example, I get another error:

Error in compat_lazy_dots(.dots, caller_env(), ...) : object 'City' not found

This is because you used the "escape hatch" functions select_, filter_ and so forth. You probabily did this because you need to use the variable x in filter_(month==x & Product==y). But now the other names like Product, which are meant to be names inside the data frame are taken to be variables, too!

Here is a tutorial on the old "escape hatch" functions

Nowadays, this is solved differently using the !! operator. See the vignette Programming with dplyr.

toptencust <- function(df,x,y){
    library(magrittr)
    library(dplyr)

    ifelse(is.character(x)
           , df %>% 
               select(City,Amount,Customer,Product,Year,month) %>%
               group_by(City,Customer) %>%
               filter(month == !!x & Product == !!y) %>% 
               summarise(Tot_repay=sum(Amount,na.rm=T)) %>% 
               top_n(n=10)
           , df %>% 
               select(City,Amount,Customer,Product,Year,month) %>%
               group_by(City,Customer) %>%
               filter(Year == !!x & Product == !!y) %>%
               summarise(Tot_repay=sum(Amount,na.rm=T)) %>% 
               top_n(n=10)
    )
}
akraf
  • 2,965
  • 20
  • 44