how to find index between two set of strings?

Question

I have two datas look like this

df1<- structure(list(V1 = structure(c(6L, 1L, 8L, 20L, 12L, 5L, 21L, 
28L, 14L, 3L, 15L, 18L, 17L, 25L, 26L, 16L, 10L, 7L, 2L, 13L, 
23L, 27L, 24L, 4L, 19L, 11L, 9L, 22L), .Label = c("O43175", "P02538", 
"P04264", "P04350", "P07437", "P08237", "P08238", "P11142", "P11498", 
"P13645", "P25705-1", "P31327-1", "P31689", "P35527", "P35555", 
"P35908", "P68104", "P68366", "P68371", "P78527", "Q01813", "Q13509", 
"Q13885", "Q15233", "Q15418-2", "Q70CQ2", "Q71U36", "Q9BQE3"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-28L))

I want to find the index of the strings from df2 that exist in the df1

df2<- structure(list(V1 = structure(c(5L, 4L, 3L, 2L, 1L), .Label = c("P02538", 
"P08238", "P13645", "P35908", "Q70CQ2"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-5L))

I do the following

match(df1$V1,df2$V1)

In this case, it works but when the data is huge and I look at them one by one, I see it matched to something else, is there any other way to check this? or make this exact match ?

I don't know what is wrong that does not work.

How can I find the index for the values in such a data

df<- structure(c(NA, NA, NA, NA, NA, NA, NA, 12L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 27L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(100L, 1L))

I tried which(is.na(df)) it does not give me the index . I only need to know which rows they are and save it in a data.frame

is there any leading/lagging spaces in your original data or is it only a partial match? Otherwise, `match` should work. i.e. `match(trimws(df$V1), trimws(df2$V1))` or try with `?pmatch` — akrun, Aug 03 '17 at 12:31
Try converting them to `character` rather than `factor` - `match(as.character(df1$V1), as.character(df2$V1))` — Andrew Gustar, Aug 03 '17 at 12:34
@akrun does not work. It gives me the match but when I do `data <- res[myindex, ]` , it gives me something else . I added an example of after match. do you know how to get the index of those rows that have value? — nik, Aug 03 '17 at 16:23
@Andrew Gustar does not work. It gives me the match but when I do `data <- res[myindex, ]` , it gives me something else . I added an example of after match. do you know how to get the index of those rows that have value? — nik, Aug 03 '17 at 16:23
@Miha does not work. It gives me the match but when I do data <- res[myindex, ] , it gives me something else . I added an example of after match. do you know how to get the index of those rows that have value? — nik, Aug 03 '17 at 16:24
@nik When I use your data and code above I get `match(df1$V1,df2$V1) [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 2 3 4 5 NA NA NA NA NA NA NA NA NA` which seems right. So your example is not reproducible. And what is `res[ ]`? You can get the index of matching rows in `df1` with `which(!is.na(match(df1$V1,df2$V1)))` which gives `[1] 15 16 17 18 19` — Andrew Gustar, Aug 03 '17 at 16:50

score 0 · Accepted Answer · edited Aug 03 '17 at 14:05

0

try

df3<-cbind(df1,pos=c(1:nrow(df1)))  
merge(df3,df2)

Output

      V1 pos
1 P02538  19
2 P08238  18
3 P13645  17
4 P35908  16
5 Q70CQ2  15

edited Aug 03 '17 at 14:05

loki

9,816
7
56
82

answered Aug 03 '17 at 13:57

Raveesh

16
1

does not work. It gives me the match but when I do data <- res[myindex, ] , it gives me something else . I added an example of after match. do you know how to get the index of those rows that have value? – nik Aug 03 '17 at 16:24

how to find index between two set of strings?

1 Answers1

Linked