Is there a function in R similar to the COUNTIF in Excel?

Question

I have dataset more than 1000000 cells and I only need the data with more than 20 days in 1 year using R based on the site name.

This is the original dataset:

I can use 'count if' function in excel to calculate how many days are in a month, but I found it extremely slow and causing problems. Here is the formula that I used on Excel.

My code on Excel:

I'm sorry for my basic question, but I hope there is someone who can help me with this matters.

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. We don't need all your real data, just something to test with. Please don't post images of data or code. We can't copy/paste those to edit them or import them. — MrFlick, Oct 20 '20 at 03:55
Which version of excel do you use? If you have `Office365` then it can be speedup significantly. — Harun24hr, Oct 20 '20 at 04:27
thank you! its my first time post something here, will definitely include sample next time! — Gerry Andhika, Oct 21 '20 at 06:29

score 2 · Answer 1 · edited Oct 20 '20 at 06:51

If I understand correctly you want the number of days per site. If your data looks like this:

site = c("a", "b", "c", "a", "b", "c", "a", "b", "c")
year = c(1991, 1992, 1993, 1991, 1992, 1993, 1991, 1992, 1993)
month = c(1, 1, 4, 4, 1, 1, 4, 4, 1)
my_data = data.frame(site, year, month)

You can use the package dplyr (install via install.packages(dplyr)):

library(dplyr)
my_data %>% group_by(site) %>% count(year, month)

Output:

# A tibble: 6 x 4
# Groups:   site [3]
  site   year month     n
  <chr> <dbl> <dbl> <int>
1 a      1991     1     1
2 a      1991     4     2
3 b      1992     1     2
4 b      1992     4     1
5 c      1993     1     2
6 c      1993     4     1

You can post a snippet of your code using dput:

dput(my_data)
structure(list(site = c("a", "b", "c", "a", "b", "c", "a", "b", 
"c"), year = c(1991, 1992, 1993, 1991, 1992, 1993, 1991, 1992, 
1993), month = c(1, 1, 4, 4, 1, 1, 4, 4, 1)), class = "data.frame", row.names = c(NA, 
-9L))

Then other people can take that the above code and reproduce your data. It doesn't have to be ALL your data, just the first ~10 rows.

aiatay7n · Accepted Answer · 2020-10-20T08:05:03.770

Agree with comment - you need to create a reproducible example. Anything in Excel can be done better in R (well, almost anything) Many methods possible - try grouping as below

df = data.frame(matrix(rnorm(40), nrow=10))
X0 <-sample(c(rep("A",3),rep("B",3),rep("C",4)), 10)
df$X0 <- x0
df        # this is a simulated dataset
           X1          X2         X3          X4 X0
1  -0.1203975  0.40176671 -0.1441914 -1.27356811  B
2  -1.2945289  0.34813784  0.3233932  0.80881055  B
3   1.0095911  0.69446583 -0.4377096 -1.41026705  A
4  -0.2547187  1.04914805 -0.8120526 -0.58734365  C
5  -0.1732585 -0.16422459 -0.8195833  0.44932410  A
6   0.5599036 -0.79013084 -0.9540998 -0.03861297  C
7  -0.5338999  0.03481498  0.3663478  0.44230402  B
8   0.9837636  0.98619889 -0.4901891  0.45207061  C
9   0.7892718 -0.23403087 -0.1102857  0.79010878  A
10  0.9392468  0.54867442 -0.4180297 -0.40482425  C


# Next, we count/index by X0 ~ you will need to do this by "site" /your choice
library(dplyr)
df1 <- df %>%
  group_by(X0) %>%
  add_count
df1

# A tibble: 10 x 6
# Groups:   X0 [3]
       X1      X2     X3      X4 X0        n
    <dbl>   <dbl>  <dbl>   <dbl> <chr> <int>
 1 -0.120  0.402  -0.144 -1.27   B         3
 2 -1.29   0.348   0.323  0.809  B         3
 3  1.01   0.694  -0.438 -1.41   A         3
 4 -0.255  1.05   -0.812 -0.587  C         4
 5 -0.173 -0.164  -0.820  0.449  A         3
 6  0.560 -0.790  -0.954 -0.0386 C         4
 7 -0.534  0.0348  0.366  0.442  B         3
 8  0.984  0.986  -0.490  0.452  C         4
 9  0.789 -0.234  -0.110  0.790  A         3
10  0.939  0.549  -0.418 -0.405  C         4
# After the above, you can sort and cut OR filter your count variable 
by 20 or any such value! Here, rows with n >3 are subset. Done.     

df2 <- df1[df1$n>3,]
df2
# A tibble: 4 x 6
# Groups:   X0 [1]
      X1     X2     X3      X4 X0        n
   <dbl>  <dbl>  <dbl>   <dbl> <chr> <int>
1 -0.255  1.05  -0.812 -0.587  C         4
2  0.560 -0.790 -0.954 -0.0386 C         4
3  0.984  0.986 -0.490  0.452  C         4
4  0.939  0.549 -0.418 -0.405  C         4

Thank you for your help! Sure, I will include sample next time. Your answer really helped me! — Gerry Andhika, Oct 21 '20 at 06:29
@Gerry Andhika: Glad to hear that! ....@Amar - useful insights in answer, just upvoted! — aiatay7n, Oct 21 '20 at 08:54

score -2 · Answer 3 · answered Oct 20 '20 at 03:53

-2

I will suggest you create an new column to CONCATENATE B, J and K columns, and then paste as value the column. After that, you can simply use countif instead of countifs.

answered Oct 20 '20 at 03:53

Victor19

97
8

1

This is not a answer. This could be a comment. – Harun24hr Oct 20 '20 at 04:27
I think it's a fine answer in excel but OP wants an `r` code – Amar Oct 20 '20 at 06:46

Is there a function in R similar to the COUNTIF in Excel?

3 Answers3