0

I have an dataset which contains names. It looks like this:

name1,name2
name4
name55,name25,name88

I have another dataset with a column that has rows, which have names. I want to find the indices of the rows that are found in the first dataset.

so:

nameColumn
name4
name25

indices 1 and 2 should be found. I am trying this:

which(mainDataset$namesColumn == namesDataset, arr.ind=TRUE)

But this is not right. Is there some kind of in operator to be used here?

Help is very welcome!

Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
user3046636
  • 693
  • 3
  • 8
  • 12
  • please read this: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example . You did not even tell what kind of data structures your datasets are stored in. – Karl Forner Mar 19 '14 at 09:22
  • welcome to SO. Use `dput(your_data_set)` to give us an example of your dataset here. That being said @KarlForner is right. It would really help to provide a useful solution here. – Matt Bannert Mar 19 '14 at 09:50
  • why do you want to the indices at all? Something like `mainDataset[mainDataset$namesColumn %in% namesDataset,]` should work given that mainDataset is data.frame and namesDataset is some vector of the same type (i.e. character). – Matt Bannert Mar 19 '14 at 09:54

1 Answers1

1

If your two data sets looks like:

namesDataset <- read.csv(text = "name1,name2
name4
name55,name25,name88", header = FALSE)

mainDataset <- read.csv(text = "nameColumn
name4
name25")

...then you may find the index of names in the vector 'nameColumn' in 'mainDataset' that are in 'namesDataset' like this:

which(mainDataset$nameColumn %in% unlist(namesDataset))
# [1] 1 2  
Henrik
  • 65,555
  • 14
  • 143
  • 159