-2

I am not quite sure why this piece of code isn't working.

Here's how my data looks like:

head(test)
  Fiscal.Year Fiscal.Quarter   Seller Product.Revenue Product.Quantity Product.Family Sales.Level.1          Group Fiscal.Week
1        2015         2015Q3 ABCD1234            4000                4      Paper cup      Americas Paper Division          32
2        2014         2014Q1  DDH1234             300                5   Paper tissue  Asia Pacific Paper Division          33
3        2015         2015Q1  PNS1234             298                6         Spoons          EMEA        Cutlery          34
4        2016         2016Q4  CCC1234             289                7         Knives        Africa        Cutlery          33

Now, my objective is to summarize revenue by year.

Here's the dplyr code I wrote:

test %>% 
  group_by(Fiscal.Year) %>%
  select(Seller,Product.Family,Fiscal.Year) %>%
  summarise(Rev1 = sum(Product.Revenue)) %>%
  arrange(Fiscal.Year)

This doesnt work. I get the error:

Error: object 'Product.Revenue' not found

However, when I get rid of select statement, it works but then I don't get to see the output with Sellers, and Product family.

test %>% 
  group_by(Fiscal.Year) %>%
 # select(Seller,Product.Family,Fiscal.Year) %>%
  summarise(Rev1 = sum(Product.Revenue)) %>%
  arrange(Fiscal.Year)

The output is :

# A tibble: 3 x 2
  Fiscal.Year  Rev1
        <dbl> <dbl>
1        2014   300
2        2015  4298
3        2016   289

This works well.

Any idea what's going on? It's been about 3 weeks since I started programming in R. So, I'd appreciate your thoughts. I am following this guide: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

Also, I looked at similar threads on SO, but I believe they were relating to issues because of "+" sign:Error in dplyr group_by function, object not found

I am looking for the following output:

  Fiscal.Year  Rev1 Product Family Seller
        <dbl> <dbl> ...             ...
1        2014   ...
2        2015   ... 
3        2016   ...

Many thanks

Community
  • 1
  • 1
watchtower
  • 4,140
  • 14
  • 50
  • 92
  • 2
    You specifically omit it in your `select` call. Perhaps you should instead use `select(Seller,Product.Family,Fiscal.Year,Product.Revenue)`? – r2evans Aug 19 '16 at 17:49
  • 1
    After posting the above query, I think the issue is that the columns in "select" are categorical. So, dplyr doesn't know how to sum them. If this is true, how do I display these columns (i.e. the values corresponding to the items)? Any thoughts? – watchtower Aug 19 '16 at 17:49
  • 3
    `select` merely chooses which columns to retain, so after your current call to `select`, the data only has three columns (not `Product.Revenue`). Nothing fancy, your call is analogous to `test[, c("Seller","Product.Family","Fiscal.Year")]` (grouping notwithstanding). – r2evans Aug 19 '16 at 17:50
  • Thanks r2evans. That statement worked, but I am not getting the output I want. The specified output would help me to create a bar graph etc. I'd appreciate your thoughts. – watchtower Aug 19 '16 at 17:52
  • 1
    Don't forget that you can run your code one line at a time and make sure it looks right. It's a very easy and very good debugging technique. I'm pretty sure if you ran just the first couple lines, `test %>% group_by(Fiscal.Year) %>% select(Seller,Product.Family,Fiscal.Year)` you could look at that output and think "Hey, after the `select` there is no `Product.Revenue` column. No wonder I'm getting the error message saying the column isn't found". – Gregor Thomas Aug 19 '16 at 18:25
  • 3
    Your example doesn't look reproducible and you have solved it yourself in the form of code with no explanations, so it doesn't look like it'll be much use to others. I'd suggest editing to clean it up and make this Q&A a useful reference, or deleting it. – Frank Aug 19 '16 at 18:37
  • @Frank, Sure, I can add explanations. That's a great suggestion. I'm curious--what part of the code is not reproducible? – watchtower Aug 19 '16 at 19:09
  • 1
    Oh, I saw `head(test)`, which usually returns only the top six rows of a data set but clearly your data only has four rows. So your data set here *is* complete, but not *easily* reproducible -- like, how would someone copy-paste that into R and try out their answer? Usually `dput` is used, but there are other options. Here's a good resource for advice on that and other issues in writing a good question: http://stackoverflow.com/a/28481250/ – Frank Aug 19 '16 at 19:19
  • Yes, there are only four rows because it's a test data. Thanks for the post. I will keep this in mind. – watchtower Aug 20 '16 at 03:25

1 Answers1

0

Ok. This did the trick:

test %>% 
  group_by(Fiscal.Year, Seller,Product.Family) %>%
  summarise(Rev1 = sum(Product.Revenue)) %>%
  arrange(Fiscal.Year)

Output:

Source: local data frame [4 x 4]
Groups: Fiscal.Year, Seller [4]

  Fiscal.Year   Seller Product.Family  Rev1
        <dbl>    <chr>          <chr> <dbl>
1        2014  DDH1234   Paper tissue   300
2        2015 ABCD1234      Paper cup  4000
3        2015  PNS1234         Spoons   298
4        2016  CCC1234         Knives   289
watchtower
  • 4,140
  • 14
  • 50
  • 92