1

The data I have look more or less like this:

data1 <- data.frame(col=c("Peter i.n.","Victor Today Morgan","Obelix","One More"))
data2 <- data.frame(num=c(123,434,545,11,22),col=c("Victor Today","Obelix Mobelix is.",
                    "Peter Asterix i.n.","Also","Here"))

I would like to match names across the two dataframes and get the column num into data1.

Desired outcome:

                   col  num
 1          Peter i.n.  545
 2 Victor Today Morgan  123
 3              Obelix  434 

I have tried this, but doesn't work as expected.

filter <- sapply(as.character(data1$col), function(x) any(grepl(x,as.character(data2$col))))
data1$num <- data2[filter,]
Maximilian
  • 4,177
  • 7
  • 46
  • 85
  • 1
    In data2, the number on Peter's row is 545, but in your desired outcome Peter has a number of 123. Is this correct? – James Trimble May 30 '14 at 11:12
  • 2
    What is your criteria for matching the 'col' columns? Some sort of fuzzy match, or containing the same words, or what? – Michael Lawrence May 30 '14 at 11:13
  • just match the first name `(Peter,Victor,Obelix)` and get the corresponding `num` value of each from `data2` to `data1` – Maximilian May 30 '14 at 11:20
  • @James. I'm sorry!Now corrected. The name `Peter Asterix...` from data2 corresponds with number `545`(third row) so that should match with `Peter i.n.` from `data1`. – Maximilian May 30 '14 at 11:23
  • @Michael. I think matching the first word would suffice. – Maximilian May 30 '14 at 11:33
  • 1
    You could potentially do it your way, but you"ll get the col names of `data2` instead of `data1`. Something like `data2[as.logical(sapply(gsub(" .*", "", as.character(data2$col)), function(x) any(grepl(x,as.character(data1$col))))),]` – David Arenburg May 30 '14 at 12:20
  • This solution is good too since I would not mind which columun of `data1` or `data2` is taken, I just need the correposing column `num`. Thanks! – Maximilian May 30 '14 at 12:47
  • I'll post it as an alternative answer then – David Arenburg May 30 '14 at 13:18

2 Answers2

2
firstName <- function(x) sub(" .*", "", x)
data1$num <- data2$num[match(firstName(data1$col), firstName(data2$col))]
data1[!is.na(data1$num),]
Michael Lawrence
  • 1,031
  • 5
  • 6
  • I also played around with `match`: like this: `data1$num <- data2$num[match(data1$col, data2$col)]` ,but obviously the `sub` part was missing here!Thank you. – Maximilian May 30 '14 at 11:52
2

If you don't mind which col names you wan to see (data1 or data2), you could utilizes your own solution by:

data2[as.logical(sapply(gsub(" .*", "", as.character(data2$col)), function(x) any(grepl(x, as.character(data1$col))))), ]

##   num                col
## 1 123       Victor Today
## 2 434 Obelix Mobelix is.
## 3 545 Peter Asterix i.n.

This will match the first word in data2$col to data1$col and retrieve the correct entries out of data2

David Arenburg
  • 91,361
  • 17
  • 137
  • 196