dplyr::tally is faster than dplyr::count. Why doesn't tally read alpha variables in a function?
for sample x for this example say:
x <- data.frame("PrecinctID" = c(101,102,103,104))
tally(x,PrecinctID == 101)[1,1]
#[1] 919
findy <- function(y) {tally(x,PrecinctID == y)[1,1]}
findy(101)
#Error: object 'y' not found
findy <- function(y) {count(x,PrecinctID == y)[2,2]}
findy(101)
#Source: local data frame [1 x 1]
# n
#1 919
[Self answer:]
I was able to solve my own problem. Tally accepts only tbl data. So whether you use tally or summarise, it works well to pump it through dplyr pipe (%>%) or "then" operator. Once you do that, quite complex fields embedded with queries can be orchestrated. Given x is large voter database:
tbl_df(x)
Source: local data frame [128,438 x 17] ...
StateVoterID RegistrationNumber LastName FirstName ...
uPID <- sort(unique(x$PrecinctID))
findP <- function(y) {
x %>%
summarise(
Count = sum(PrecinctID == y),
Good = sum(AVReturnStatus == "Good" & PrecinctID == y),
Late = sum(AVReturnChallenge == "Too Late" & PrecinctID == y))
}
u1 <- t(sapply(uPID,findP))
u1 <- cbind(uPID,u1)
head(u1)
uPID Count Good Late
[1,] 101 917 476 4
[2,] 102 630 367 8
[3,] 103 687 482 2
[4,] 104 439 312 1
[5,] 105 414 252 0
[6,] 106 778 422 2