Populate all items in grouping R

Question

I'm receiving an error in which I believe the root cause is that within my groupings there are not values across all groups.

Data can be downloaded here: https://opendata.miamidade.gov/311/311-Service-Requests-Miami-Dade-County/dj6j-qg5t

What I want to do is to have a function that takes a nested grouping and detects all of the holes and populates zeros. Lets take the following code sample:

d <- rDSamp %>% 
  FilterDateRange("Ticket.Created.Date...Time", "1/1/2013", "12/31/2013") %>%
  group_by(Ticket.Created.Date...Time, Case.Owner) %>%
  summarise(
    count = n()
  ) %>%
  arrange(Ticket.Created.Date...Time)

After the summarise, I need to add a function that goes through every date, and if the case owner does not exist in that date, create the case owner, and add a count of 0.

Here is the code to get to this point:

library("ggvis") 
library("magrittr") 
library("dplyr")
library("tidyr")
library("shiny")
library("checkpoint")

checkpoint("2016-03-29")

rData <- read.csv("C:\\data\\Miami_311.csv", 
                 header=TRUE, 
                 sep=",")
rDSamp <- rData[sample(1:length(rData$Case.Owner), 1000),]
rDSamp = rData %>%
    subset(
      Case.Owner == "Animal_Services" |
        Case.Owner == "Waste_Management" |
        Case.Owner == "Community_Information_and_Outreach" |
        Case.Owner == "Waste_Management")
rDSamp$Case.Owner = factor(rDSamp$Case.Owner)
#Convert to known date time
rDSamp$Ticket.Created.Date...Time <- 
  rDSamp$Ticket.Created.Date...Time %>%
  as.POSIXct(format="%m/%d/%Y") %>%
  as.character()

FilterDateRange = function(data, feature, minDate, maxDate) {
  minDate = minDate %>% 
          as.POSIXct(format="%m/%d/%Y") %>% 
          as.character() 
  maxDate = maxDate %>% 
          as.POSIXct(format="%m/%d/%Y") %>% 
          as.character() 
  result = subset(data, data[feature] <= maxDate)
  subset(result, result[feature] >= minDate)
}

d <- rDSamp %>% 
  FilterDateRange("Ticket.Created.Date...Time", "1/1/2013", "12/31/2013") %>%
  group_by(Ticket.Created.Date...Time, Case.Owner) %>%
  summarise(
count = n()
  ) %>%
  arrange(Ticket.Created.Date...Time)

For final information, I'm trying to use ggvis layer_smooths and it is reporting na's introduced by coersion, my assumption is holes in the data is causing this.

Found one solution, looking for more generic one...

FillDataHolesWithZeros = function(input){
  countZero = input %>% 
    group_by(Ticket.Created.Date...Time) %>% 
    summarise(count = n()) %>%
    filter(count < length(levels(input$Case.Owner)))
  for(i in 1:nrow(countZero))
  {
date = countZero[i,]$Ticket.Created.Date...Time
departments = input %>% filter(Ticket.Created.Date...Time == date)
myLevels = levels(input$Case.Owner)
for(j in 1:nrow(departments))
{
  owner = departments[j,]$Case.Owner
  myLevels = myLevels[myLevels != owner]
}
print(paste(i,":",myLevels))
for(k in 1:length(myLevels)){
  input = input %>% rbind(data.frame(
    Ticket.Created.Date...Time = date,
    Case.Owner = myLevels[k],
    count = 0
      ))
    }
  }
  return(input)
}

Shiny tag, because its powering a shiny visualization throwing the error. The smooths plot is dying. — David Crook, Mar 31 '16 at 14:53
For the NA question, my data frame doesn't have NAs until I try to perform the plot. the issue is the data frame is complete, however if you group by date, there simply are not rows for some of the case owners as no calls were made for that department that day. I need to figure out how to add rows for those days in which case owners don't exist. — David Crook, Mar 31 '16 at 14:56
So show only problem, your have data.frame with date column and your have list of all posible dates ( or interval) , and whant to add rows to data.frame? — Batanichek, Mar 31 '16 at 15:02
I have the following features/columns: Date, CaseOwner, CallCount. I have grouped by Date, CaseOwner (in that order). When I execute a loess on the dataset it fails, as in some dates, there are holes where no row exists. Example: 1/2/2013, AnimalService, 3 - 1/3/2013, AnimalService, 4 - 1/3/2013 WateManagement, 4, - 1/3/2013 OutReach, 5 I need to add 2 rows in 1/2/2013 with zero for WasteManagement and OutReach, but my factors can change dynamically. — David Crook, Mar 31 '16 at 15:11
1 last item, it would be ideal for the row to be added within the grouping. — David Crook, Mar 31 '16 at 15:13

score 1 · Accepted Answer · answered Mar 31 '16 at 16:41

Try

for example

DATA

(for future try show reproduceble data and concrete problem)

Date=c(rep("2016-01-01",2),rep("2016-01-02",3),rep("2016-01-03",4))
CaseOwner=c(letters[1:2],letters[1:3],letters[1:4])
CallCount=1:9
dat1=data.frame(Date, CaseOwner, CallCount)

group + add row

library(dplyr)
library(tidyr)
dat1%>%group_by(Date,CaseOwner)%>%summarize(cnt=max(CallCount))%>%complete(CaseOwner, fill = list(cnt = 0))

result

Source: local data frame [12 x 3]

         Date CaseOwner   cnt
       (fctr)    (fctr) (dbl)
1  2016-01-01         a     1
2  2016-01-01         b     2
3  2016-01-01         c     0
4  2016-01-01         d     0
5  2016-01-02         a     3
6  2016-01-02         b     4
7  2016-01-02         c     5
8  2016-01-02         d     0
9  2016-01-03         a     6
10 2016-01-03         b     7
11 2016-01-03         c     8
12 2016-01-03         d     9

additional

1) %in% -look pretty then some |

rDSamp = rData %>%
    subset(
      Case.Owner == "Animal_Services" |
        Case.Owner == "Waste_Management" |
        Case.Owner == "Community_Information_and_Outreach" |
        Case.Owner == "Waste_Management")

Can be changed on

    rDSamp = rData[rData$Case.Owner %in% 
c("Animal_Services","Waste_Management","Community_Information_and_Outreach","Waste_Management"),]

2) if you want to compare date your not need to convert it to char

maxDate = maxDate %>% 
          as.POSIXct(format="%m/%d/%Y") %>% 
          as.character()

and

data[feature] <= maxDate

will be compared as string.

Awesome note on the subsets. On the dates, I was getting issues where converting it directly to a date left some weird artifacts, I'll clear out my rSession and try it again. — David Crook, Mar 31 '16 at 19:32
Not everyone want and not all can download your data ( my proxy on work not give me it) better use smthing which anyone can copy paste — Batanichek, Mar 31 '16 at 19:51
and "[" better than "subset" in r (safer ) http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset — Batanichek, Mar 31 '16 at 19:57
+ one more for %in% that your can use vector as second argunet like a %in% b whe b can be dunamic ( multiple select input in shiny for example) — Batanichek, Mar 31 '16 at 20:01

Populate all items in grouping R

1 Answers1

DATA

group + add row

result

additional