R: Add column to dataframe based on partial matching characters

Question

I have an example dataframe with an ID and value column:

ID_short    Value
Boar            4
Pig             5
Duck            6
Dog             7
Cat             8
Horse           9

I have another dataframe which has a column with the same IDs but extended with more characters:

ID_Extended
Duck_p15
Dog32
PigGG
Horse_p12
Cat_Ok
Boar_Ko_1999_test

I want to add this ID_Extended column to the first dataframe and I want the extended IDs to still match up with the short IDs in the correct row. The IDs are class character.

Example of desired output:

ID  Value   ID_Extended
Boar    4   Boar_Ko_1999_test
Pig     5   PigGG
Duck    6   Duck_p15
Dog     7   Dog32
Cat     8   Cat_Ok
Horse   9   Horse_p12

score 3 · Accepted Answer · answered Dec 02 '19 at 13:01

Here is something:

df1$D_Extended <- 
  df2$ID_Extended[sapply(df1$ID_short, 
                         function(x) match(x, substr(df2$ID_Extended, 1, nchar(x))))]


df1
  ID_short Value        D_Extended
1     Boar     4 Boar_Ko_1999_test
2      Pig     5             PigGG
3     Duck     6          Duck_p15
4      Dog     7             Dog32
5      Cat     8            Cat_Ok
6    Horse     9         Horse_p12

Data:

df1 <- data.frame(
  ID_short = c("Boar", "Pig", "Duck", "Dog", "Cat", "Horse"), 
  Value = 4:9,
  stringsAsFactors = FALSE
)
df2 <- data.frame(
  ID_Extended = c("Duck_p15", "Dog32", "PigGG","Horse_p12", "Cat_Ok", "Boar_Ko_1999_test"),
  stringsAsFactors = FALSE
)

score 2 · Answer 2 · answered Dec 02 '19 at 13:03

We can use match after extracting a substring of the 'ID_Extended' from 'df2'

df1$ID_Extended <- df2$ID_Extended[match(df1$ID_short, 
            sub("^([A-Z][a-z]+).*", "\\1", df2$ID_Extended))]

data

df1 <- structure(list(ID_short = c("Boar", "Pig", "Duck", "Dog", "Cat", 
"Horse"), Value = 4:9), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(ID_Extended = c("Duck_p15", "Dog32", "PigGG", 
"Horse_p12", "Cat_Ok", "Boar_Ko_1999_test")), class = "data.frame",
row.names = c(NA, 
-6L))

R: Add column to dataframe based on partial matching characters

2 Answers2

data