0

Consider the following replicable data frame:

col1 <- c(rep("a", times = 5), rep("b", times = 5), rep("c", times = 5))
col2 <- c(0,0,1,1,0,0,1,1,1,0,0,0,0,0,1)
data <- as.data.frame(cbind(col1, col2))

Now the data is a matrix of 15x2. Now I want to count how many zeros there are with the condition that only for the rows of a's. I use table():

table <- table(data$col2[data$col1=="a"])
table[names(table)==0]

This works just fine and result is 3.

But my real data has 100,000 observations with 12 different values of such col1 so I want to make a function so I don't have to type the above lines of code 12 times.

countzero <- function(row){
  table <- table(data$col2[data$col1=="row"])
  result <- table[names(table)==0]
  return(result)
}

I expected that when I run countzero(row = a) it will return 3 as well but instead it returns 0, and also 0 for b and c.

For my real data, it returns

numeric(0)

which I have no idea why.

Anyone could help me out please?

EDIT: To all the answers showing me how to count in total how many zeros for each value of col1, it works all fine, but my purpose is to build a function that returns only the count of one specific col1 value, e.g. just the a's, because that count will be used later to compute other stuff (the percent of 0's in all a's, e.g.)

AnodeHorn
  • 1
  • 4
  • Hint: Avoid naming objects with names that already exist as functions in R – Sotos Apr 07 '17 at 12:42
  • `tapply(data$col2==0, data$col1, sum)` http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega – jogo Apr 07 '17 at 12:51
  • `countzero <- function(row) sum(data$col2[data$col1==row]==0); countzero(row = "a")` – jogo Apr 07 '17 at 13:13

2 Answers2

2

1) aggregate Try aggregate:

aggregate(col2 == 0 ~ col1, data, sum)

giving:

  col1 col2 == 0
1    a         3
2    b         2
3    c         4

2) table or try table (omit the [,1] if you want the counts of 1's too):

table(data)[, 1]

giving:

a b c 
3 2 4 
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Hi thanks for answering but this still doesn't solve my problem, the codes work just fine alone but when putting in a function they won't work. Like in my example, I think the problem is because the variable `row` is put in quotation mark inside the function, so something went wrong there. – AnodeHorn Apr 07 '17 at 13:01
  • `f <- function(data) aggregate(col2 == 0 ~ col1, data, sum); f(data)` works for me. – G. Grothendieck Apr 07 '17 at 13:04
  • Oh yes it does but my purpose is to have a function that returns just the count for a's, for example. That's where the problem kicks in cuz I don't know how to extract just the result of a's out of the count table, all inside one `function()`. – AnodeHorn Apr 07 '17 at 13:09
  • The question stated that you wanted to do it for all levels of col1 but to create such a function `f <- function(data, x) with(data, sum(col1 == x & col2 == 0)); f(data, "a")` ` – G. Grothendieck Apr 07 '17 at 13:13
0

We can use data.table which would be efficient

library(data.table)
setDT(data)[col2==0, .N, col1]
#   col1 N
#1:    a 3
#2:    b 2
#3:    c 4

Or with dplyr

library(dplyr)
data %>%
    filter(col2==0) %>%
    count(col1)
akrun
  • 874,273
  • 37
  • 540
  • 662