2

I have a very simple data count problem where I am grouping by two fields (MktDate,Indexname) which are date and chr types respectively. I have a few indexes with no data on certain dates and all I'm looking for my code to do is to include a 0 on those dates. I have tried a number of different solutions following dplyr's group_by documentation but I can not get this short piece of code to return a value of 0 when no data is present. Here is my code

IdxStats <- IdxData %>% 
  group_by(MktDate,IndexName,.drop=FALSE) %>% 
  summarize(CountSecurity = length(MktDate)) %>% 
  ungroup
  • 2
    Hi user1105887. Welcome to StackOverflow! Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610).That way you can help others to help you! – dario Feb 19 '20 at 15:51
  • Sorry, I thought the code would be simple enough as described without an example. I can dummy something up in a couple hours. – user11058887 Feb 19 '20 at 15:57

1 Answers1

3

The .drop is correct, but when you use length, the data.frame doesn't exist at all, so length is it, will be weird. Try this below:

set.seed(100)
IdxData = data.frame(MktDate=sample(1:3,10,replace=TRUE),
                     IndexName=sample(LETTERS[1:3],10,replace=TRUE))

IdxData %>% count(MktDate,IndexName,.drop=FALSE)
# A tibble: 9 x 3
  MktDate IndexName     n
    <int> <fct>     <int>
1       1 A             0
2       1 B             0
3       1 C             1
4       2 A             1
5       2 B             1
6       2 C             4
7       3 A             0
8       3 B             2
9       3 C             1

Or if you need the name "CountSecurity" (thanks to @arg0naut91 ) :

IdxData %>% 
count(MktDate,IndexName,.drop=FALSE,name="CountSecurity")
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks this answer worked for me with one small change. I had to turn my date and character fields in to factors. I'm not sure why this is necessary, but this was the only difference between the example you provided and my use case. – user11058887 Feb 19 '20 at 17:56
  • Yes, you need it to be factors so that it knows what to tabulate on. See https://www.stat.berkeley.edu/~s133/factors.html, the part about missing months. Cool, great that you found that out :) – StupidWolf Feb 19 '20 at 18:00