Is it possible to customize setdiff
using regular expressions to see what is in one vector and not another? For example:
x <- c("1\t119\t120\t1\t119\t120\tABC\tDEF\t0", "2\t558\t559\t2\t558\t559\tGHI\tJKL\t0", "3\t139\t141\t3\t139\t141\tMNO\tPQR\t0", "3\t139\t143\t3\t139\t143\tSTU\tVWX\t0")
[1] "1\t119\t120\t1\t119\t120\tABC\tDEF\t0"
[2] "2\t558\t559\t2\t558\t559\tGHI\tJKL\t0"
[3] "3\t139\t141\t3\t139\t141\tMNO\tPQR\t0"
[4] "3\t139\t143\t3\t139\t143\tSTU\tVWX\t0"
y <- c("1\t119\t120\t1\t109\t120\tABC\tDEF\t0", "2\t558\t559\t2\t548\t559\tGHI\tJKL\t0", "3\t139\t141\t3\t129\t141\tMNO\tPQR\t0", "3\t139\t143\t3\t129\t143\tSTU\tVWX\t0", "4\t157\t158\t4\t147\t158\tXWX\tYTY\t0", "5\t158\t159\t5\t148\t159\tPHP\tWZW\t0")
[1] "1\t119\t120\t1\t109\t120\tABC\tDEF\t0"
[2] "2\t558\t559\t2\t548\t559\tGHI\tJKL\t0"
[3] "3\t139\t141\t3\t129\t141\tMNO\tPQR\t0"
[4] "3\t139\t143\t3\t129\t143\tSTU\tVWX\t0"
[5] "4\t157\t158\t4\t147\t158\tXWX\tYTY\t0"
[6] "5\t158\t159\t5\t148\t159\tPHP\tWZW\t0"
I want to be able to show that:
[5] "4\t157\t158\t4\t147\t158\tXWX\tYTY\t0"
[6] "5\t158\t159\t5\t148\t159\tPHP\tWZW\t0"
are new because 4\t157\t158
and 4\t157\t158
are unique to y
. This doesn't work:
> setdiff(y,x)
[1] "1\t119\t120\t1\t109\t120\tABC\tDEF\t0" "2\t558\t559\t2\t548\t559\tGHI\tJKL\t0"
[3] "3\t139\t141\t3\t129\t141\tMNO\tPQR\t0" "3\t139\t143\t3\t129\t143\tSTU\tVWX\t0"
[5] "4\t157\t158\t4\t147\t158\tXWX\tYTY\t0" "5\t158\t159\t5\t148\t159\tPHP\tWZW\t0"
Because column 5 is clearly different in both x
and y
. I want to setdiff
only based on the first three columns.
A simple example of setdiff
can be found here: How to tell what is in one vector and not another?