3

I have a dataset of 4 columns with hundreds of rows, here's a little sample:

A      B     C     D     
V1     V2    V100    V4
V15    V5    V6    V100 
V8     V3    V9    V10
V3     V11   V12   V13

I would like to get a list of variables that are in column A but not in the others (like V1 in the above example), then another list of variables that are in column C and D but not in others (Like V100 in the example) and so on. Any simple command that can do this without having to go into complicated for loops?

Important note: the names I have are way too much complicated (AND THEY CONTAIN BRACKETS, SLASHES, BACKSLASHES AND UNDESCORES) this is just a simple representation of what I have.

Thanks,

Error404
  • 6,959
  • 16
  • 45
  • 58
  • 2
    your example doesn't seem to match with your description - e.g. V3 is in all columns, and do you want V1,V15,V8 from column A? please be a little more clear – eddi Apr 23 '13 at 15:36
  • 1
    @eddi, sorry I made a mistake with the variables, I edited the question, thanks for your notice. – Error404 Apr 23 '13 at 15:38

1 Answers1

5

Constructing a reproducible example:

set.seed(1)
d <- data.frame(replicate(4,paste0("V",sample(1:10,4,replace=TRUE))))
names(d) <- LETTERS[1:4]
#    A   B  C  D
#1  V3  V3 V7 V7
#2  V4  V9 V1 V4
#3  V6 V10 V3 V8
#4 V10  V7 V2 V5

I believe you're looking for setdiff.

with(d,setdiff(A,D))

If you want to do multiple comparisons, Reduce may help:

with(d,Reduce(setdiff,list(A,B,C,D)))

If you want to compare one column to many (or all) others:

Reduce(setdiff,c(d[,"A",drop=FALSE],d[,setdiff(names(d),"A"),drop=FALSE]))
Community
  • 1
  • 1
Blue Magister
  • 13,044
  • 5
  • 38
  • 56