looking up strings in different columns in R

Question

I have a dataset of 4 columns with hundreds of rows, here's a little sample:

A      B     C     D     
V1     V2    V100    V4
V15    V5    V6    V100 
V8     V3    V9    V10
V3     V11   V12   V13

I would like to get a list of variables that are in column A but not in the others (like V1 in the above example), then another list of variables that are in column C and D but not in others (Like V100 in the example) and so on. Any simple command that can do this without having to go into complicated for loops?

Important note: the names I have are way too much complicated (AND THEY CONTAIN BRACKETS, SLASHES, BACKSLASHES AND UNDESCORES) this is just a simple representation of what I have.

Thanks,

your example doesn't seem to match with your description - e.g. V3 is in all columns, and do you want V1,V15,V8 from column A? please be a little more clear — eddi, Apr 23 '13 at 15:36
@eddi, sorry I made a mistake with the variables, I edited the question, thanks for your notice. — Error404, Apr 23 '13 at 15:38

score 5 · Accepted Answer · edited May 23 '17 at 10:25

Constructing a reproducible example:

set.seed(1)
d <- data.frame(replicate(4,paste0("V",sample(1:10,4,replace=TRUE))))
names(d) <- LETTERS[1:4]
#    A   B  C  D
#1  V3  V3 V7 V7
#2  V4  V9 V1 V4
#3  V6 V10 V3 V8
#4 V10  V7 V2 V5

I believe you're looking for setdiff.

with(d,setdiff(A,D))

If you want to do multiple comparisons, Reduce may help:

with(d,Reduce(setdiff,list(A,B,C,D)))

If you want to compare one column to many (or all) others:

Reduce(setdiff,c(d[,"A",drop=FALSE],d[,setdiff(names(d),"A"),drop=FALSE]))

looking up strings in different columns in R

1 Answers1