Count duplicates based on condition in R

Question

I want to count column d where duplicate values are available with conditions like 1) a = 3w1 and a = 7w1 2) a = 3w1 and a = sp2 3) a = 3W1 and a = 3W1 and so forth. So, that each id in column d can be calculated in each interaction of column a.

I can extract or count the duplicates based on the overlapping dates, which might be more meaningful, but I am getting errors with dates. d is overlapped in 3W1, 7W1, 5W1, 14W. I sed library(dplyr).

a     b         c         d
3W1  5/11/2020 5/31/2020  1
3W1  5/11/2020 5/31/2020  1
7W1  5/11/2020 6/28/2020  1
5W1  6/1/2020  7/5/2020   1
14W  5/11/2020 8/16/2020  1
3W1  5/11/2020 5/31/2020  2
SP2  6/15/2020 8/16/2020  3
3W1  5/11/2020 5/31/2020  4
3W1  5/11/2020 5/31/2020  4

While you [tried](https://i.stack.imgur.com/fYbzb.png), please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557 (and https://xkcd.com/2116/). Please just include the code, console output, or data (e.g., `dput(head(x))` or `data.frame(...)`) directly. — r2evans, Oct 15 '20 at 00:10
And I do not see `3w1` in that image. It would help to have not just the data in a format we can copy from (see above), but also your expected results given that sample data. If you've tried code that didn't quote work, please include it, as well as packages you're intending to use to accomplish this. Some good references for how to ask complete, self-contained, reproducible questions: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, Oct 15 '20 at 00:12

Paul · Accepted Answer · 2020-10-15T07:12:17.027

Please find below an attempt that counts the number of unique d values for each combination of a values. It is not elegant but feel free to improve it.

library(dplyr)
# Create a table with all possible combo of df$a values
conds <- expand.grid(cond1 = unique(df$a), cond2 = unique(df$a), stringsAsFactors = FALSE)
conds

# Use this to make multiple subsets of df and each time count the number of unique d values
test <- setNames( object = as.data.frame(apply(conds, 1, function(x) df %>% filter(a %in% c(x[1], x[2])) %>% summarise(length(unique(d))))),
                  apply(conds, 1, function(x) paste(x[1], x[2], sep = " & ")) )

# Reshape this to get a pretty printed result
res <- reshape(test,
        varying = colnames(test),
        times = colnames(test),
        timevar = "conditions",
        v.names = "count_of_unique_d",
        direction = "long",
        new.row.names = seq_along(colnames(test)))
res <- res[, c("conditions", "count_of_unique_d")]
res

What is happening?

apply(conds, 1, function(x) df %>% filter(a %in% c(x[1], x[2])) %>% summarise(length(unique(d)))) subset df according to each row of conds which are you conditions on a. Results is stored in a list turned into a dataframe using as.data.frame(). setNames() give a name to each column so you know which conditions are applied.

Output:

> head(res, 5)
  conditions count_of_unique_d
1  3W1 & 3W1                 3
2  7W1 & 3W1                 3
3  5W1 & 3W1                 3
4  14W & 3W1                 3
5  SP2 & 3W1                 4

Data:

df <- structure(list(a = c("3W1", "3W1", "7W1", "5W1", "14W", "3W1", 
                           "SP2", "3W1", "3W1"), b = c("5/11/2020", "5/11/2020", "5/11/2020", 
                                                       "6/1/2020", "5/11/2020", "5/11/2020", "6/15/2020", "5/11/2020", 
                                                       "5/11/2020"), c = c("5/31/2020", "5/31/2020", "6/28/2020", "7/5/2020", 
                                                                           "8/16/2020", "5/31/2020", "8/16/2020", "5/31/2020", "5/31/2020"
                                                       ), d = c(1L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                          -9L))

Count duplicates based on condition in R

1 Answers1