-1

I have a a simple, but large data frame (lateness_tbl) consisting of three columns (Days, Due_Date, End_Date). I need to see how many times each Due Date is matched in End Date. I’m currently doing something like this:

x <- c()
for (i in 1:length(lateness_tbl$Due_Date){
    x[i] <- sum(lateness_tbl$Due_Date[i] == lateness_tbl$End_Date)}

The only problem is I have more than 2 million records to compare and am looking for help from the community to speed this up. Any tips, tricks, or corrections would be awesome. Thanks

C Sheff
  • 1
  • 1
  • R is a free, open-source programming language and software environment for statistical computing, bioinformatics, visualization and general computing. **Provide minimal, reproducible, representative example(s) with your questions. Use dput() for data and specify all non-base packages with library calls.** Do not embed pictures for data or code, use indented code blocks. For statistics questions, use stackexchange.com – Andre Elrico Oct 15 '18 at 10:59
  • Try `apply(as.matrix(lateness_tbl$Due_Date,ncol=1),1,function(x){sum(x==lateness_tbl$End_Date)})`. – user2974951 Oct 15 '18 at 11:06

1 Answers1

0

There is a simple solution to it. You can define a new vector to store the differences between the EndDate and DueDate and then count the entries on this vector that are equal to zero.

differences <- lateness_tbl$Due_Date - lateness_tbl$End_Date
length(which(differences == 0))

If Due_date and End_Date are data (and not integers), you can use the difftime function as shown here and use the same strategy pointed above.

Iago Carvalho
  • 410
  • 1
  • 5
  • 15