0

I have two data frame:

df1<- data.frame(place=c("KARACA ADANA","ASIL BOLU","GAZIANTEP","YUKARI/MERSIN"))
df2<- data.frame(city=c("ADANA","BOLU","ANTEP","MERSIN"), neighbor=c("KARACA","ASIL","GAZI","YUKARI"))

I need to match columns df1$place and df2$neighbor. If df1$place contains the word in df2$neighbor, it should create a new column to df1$newcol by copying the corresponding value of df2$city of matches.

df1$newcol <- data.frame(place=c("KARACA ADANA","ASIL BOLU","GAZIANTEP","YUKARI/MERSIN") ,city=c("ADANA","BOLU","ANTEP","MERSIN"))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
genco
  • 35
  • 6

2 Answers2

1

Here's an approach with sapply from base R:

If you want only whole words to match, you could use a regular expression. \\b looks for a word boundary.

ind <- unlist(sapply(df2$neighbor, function(x) grep(paste0("\\b",x,"\\b"),df1$place)))
ind2 <- rep(1:length(df2$neighbor),
            times = sapply(df2$neighbor, function(x) length(grep(paste0("\\b",x,"\\b"),df1$place))))
df1$newcol <- NA
df1$newcol[ind] <- as.character(df2$city[ind2])
df1
#          place newcol
#1  KARACA ADANA  ADANA
#2     ASIL BOLU   BOLU
#3     GAZIANTEP   <NA>
#4 YUKARI/MERSIN MERSIN
#5 YUKARI/MERSIN MERSIN
#6     GAZIANTEP   <NA>
#7     ASIL BOLU   BOLU
#8  KARACA ADANA  ADANA

Sample Data

df1<- data.frame(place=c(c("KARACA ADANA","ASIL BOLU","GAZIANTEP","YUKARI/MERSIN"),
                         rev(c("KARACA ADANA","ASIL BOLU","GAZIANTEP","YUKARI/MERSIN"))))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • This really worked !! Thank you so much – genco Jun 04 '20 at 13:47
  • I just realize that this code works when there is an order between matching columns, What if there isn't an order. What should I do in that case – genco Jun 04 '20 at 14:16
  • I am so sorry, but I have one more problem. df1 and df2 don't have the same lenght. For that reason the edited code doesn't work also – genco Jun 04 '20 at 14:34
  • yes, it warns In df1$newcol[ind] <- as.character(df2$city[logical]): "number of items to replace is not a multiple of replacement length" – genco Jun 04 '20 at 14:41
  • This time works :) You saved my life, thank you sooo much – genco Jun 04 '20 at 14:59
  • Can I ask you one more thing related to this question? – genco Jun 08 '20 at 13:03
  • If my data as below; how can a make two word search in a sentence and if these two word is included in the sentence, than i will take the df2$city – genco Jun 08 '20 at 13:14
  • df1<- data.frame(place=c("GEMİ İNŞAATI VE GEMİ MAKİNELERİ MÜHENDİSLİĞİ PR" , "GEMİ VE DENİZ TEKNOLOJİSİ MÜHENDİSLİĞİ PR","GEMİ MAKİNELERİ İŞLETME MÜHENDİSLİĞİ PR", "ENDÜSTRİ SİSTEMLERİ MÜHENDİSLİĞİ PR", "BİYOSİSTEM MÜHENDİSLİĞİ PR")) df2<- data.frame(city=c("ENDÜSTRİYEL TASARIM","GEMİ MÜHENDİSLİĞİ","ENDÜSTRİ MÜHENDİSLİĞİ","BİYOSİSTEM MÜHENDİSLİĞİ"), neighbor=c("ENDÜSTRİ TASARIM","GEMİ MÜHENDİS","ENDÜSTRİ MÜHENDİS","BİYO MÜHENDİS")) – genco Jun 08 '20 at 13:15
  • This is quite complicated. I think you should ask a new question. – Ian Campbell Jun 08 '20 at 13:19
0

try to do it this way

library(tidyverse)
df1 %>% 
  rowwise() %>% 
  mutate(out = df2$city[str_which(place, df2$city)])
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14