For each value in a column count occurrences of that value in another column

Question

I have a a simple, but large data frame (lateness_tbl) consisting of three columns (Days, Due_Date, End_Date). I need to see how many times each Due Date is matched in End Date. I’m currently doing something like this:

x <- c()
for (i in 1:length(lateness_tbl$Due_Date){
    x[i] <- sum(lateness_tbl$Due_Date[i] == lateness_tbl$End_Date)}

The only problem is I have more than 2 million records to compare and am looking for help from the community to speed this up. Any tips, tricks, or corrections would be awesome. Thanks

R is a free, open-source programming language and software environment for statistical computing, bioinformatics, visualization and general computing. **Provide minimal, reproducible, representative example(s) with your questions. Use dput() for data and specify all non-base packages with library calls.** Do not embed pictures for data or code, use indented code blocks. For statistics questions, use stackexchange.com — Andre Elrico, Oct 15 '18 at 10:59
Try `apply(as.matrix(lateness_tbl$Due_Date,ncol=1),1,function(x){sum(x==lateness_tbl$End_Date)})`. — user2974951, Oct 15 '18 at 11:06

score 0 · Answer 1 · answered Oct 15 '18 at 14:52

There is a simple solution to it. You can define a new vector to store the differences between the EndDate and DueDate and then count the entries on this vector that are equal to zero.

differences <- lateness_tbl$Due_Date - lateness_tbl$End_Date
length(which(differences == 0))

If Due_date and End_Date are data (and not integers), you can use the difftime function as shown here and use the same strategy pointed above.

For each value in a column count occurrences of that value in another column

1 Answers1