1

I want to extract text/value that is same in col1 and col2 , and create "desired_col" as provided in my data frame. I tried few things but did not work ..

mydata_1<-data.frame(col1=c("SL1234","SL786876"),col2=c("SL1334","SL78076"),desired_col=c(c("SL1","SL78")))
Yogesh Kumar
  • 609
  • 6
  • 22
  • Possible duplicate of https://stackoverflow.com/questions/28261825/longest-common-substring-in-r-finding-non-contiguous-matches-between-the-two-str – akrun Jun 03 '18 at 15:19

1 Answers1

3

An option using mapply as:

mydata_1$matched <- mapply(function(x,y){
  # First take same length fron both columns
  x <- substring(x,1, min(nchar(x),nchar(y)))
  y <- substring(y,1, min(nchar(x),nchar(y)))

  matching_len <- which(strsplit(x, split = "")[[1]] != strsplit(y, split = "")[[1]])[1]-1
  substring(x, 1, matching_len)
}, mydata_1$col1, mydata_1$col2)


mydata_1
#       col1    col2 desired_col matched
# 1   SL1234  SL1334         SL1     SL1
# 2 SL786876 SL78076        SL78    SL78

Data:

mydata_1<-data.frame(col1=c("SL1234","SL786876"),
                     col2=c("SL1334","SL78076"),
                     desired_col=c(c("SL1","SL78")), 
                     stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33
  • when I run this code , I get the below error " Error in strsplit(x, split = "") : non-character argument 4. strsplit(x, split = "") 3. which(strsplit(x, split = "")[[1]] != strsplit(y, split = "")[[1]]) 2. (function (x, y) { matching_len <- which(strsplit(x, split = "")[[1]] != strsplit(y, split = "")[[1]])[1] - 1 ... 1. mapply(function(x, y) { matching_len <- which(strsplit(x, split = "")[[1]] != strsplit(y, split = "")[[1]])[1] - 1 substring(x, 1, matching_len) ... " – Yogesh Kumar Jun 03 '18 at 15:33
  • @YogeshKumar Your data.frame contains `factor` for string. You can modify the definition of the data frame to include `stringsAsFactors = FALSE` (as I have shown in my example data). Or you have to convert `x` and `y` within function itself. – MKR Jun 03 '18 at 15:40