I want to know if any elements in character string occur in other character string.
My data contains of millions of rows and is structures as the the following mock data:
dt <- data.table(product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"), stringsAsFactors = F)
dt[, product.2 := shift(product, type = "lead")]
dt[, product.3 := shift(product, n = 2, type = "lead")]
> dt
product stock product.2 product.3
1: A A B C
2: B A,B C A,C,E
3: C A,B,C A,C,E A,B
4: A,C,E A,B,C,E A,B A,B,C
5: A,B A,B,C,E A,B,C D
6: A,B,C A,B,C,E D A
7: D A,B,C,D,E A B
8: A A B A
9: B A,B A A
10: A A,B A A,B,C
11: A A A,B,C D
12: A,B,C A,B,C D D
13: D A,B,C,D D <NA>
14: D A,B,C,D <NA> <NA>
From this data I would like to know the following:
- If any element (separated by
comma
) inproduct.3
occurs inproduct.2
- If any element (separated by
comma
) inproduct.3
does not occur instock
. - Do all elements (separated by
comma
) inproduct.3
occur instock
?
The expected outcome is something like this:
> dt
product stock product.2 product.3 outcome1 outcome2 outcome3
1: A A B C FALSE FALSE FALSE
2: B A,B C A,C,E TRUE TRUE FALSE
3: C A,B,C A,C,E A,B TRUE TRUE TRUE
4: A,C,E A,B,C,E A,B A,B,C TRUE TRUE TRUE
5: A,B A,B,C,E A,B,C D FALSE FALSE FALSE
6: A,B,C A,B,C,E D A FALSE TRUE TRUE
7: D A,B,C,D,E A B FALSE TRUE TRUE
8: A A B A FALSE TRUE TRUE
9: B A,B A A TRUE TRUE TRUE
10: A A,B A A,B,C TRUE TRUE FALSE
11: A A A,B,C D FALSE FALSE FALSE
12: A,B,C A,B,C D D TRUE FALSE FALSE
13: D A,B,C,D D <NA> FALSE FALSE FALSE
14: D A,B,C,D <NA> <NA> FALSE FALSE FALSE
This question is part of this question on Stackoverflow.
EDIT 08/20/2019: Included a third expected outcome.