0

I have two data frames.

DATA

Name                Type            Code
gabapentine         Pharmaceutical  60142-96-3
Glyphosate          Pesticide       1071-83-6
Guanylurea          Pharmaceutical  141-83-3
hydrochlorthiazide  Pharmaceutical  58-93-5

Values

Name                Value           Code
gabapentine         0,2             60142-96-3
Glyphosate          1,8             1071-83-6
Urea                1,2             141-83-3
hydrochlorthiazide  0,5             58-93-5

I want to add the column type from Data to Values, by matching the columns Name and Code.

I know how to match with just one column, like this:

Values$type = Data$type[match(Values$Name, Data$Name)]

But now I want to take into account also the Code, since some names don't match.

Is there a way to do it in just one line, like

Values$type = Data$type[match((Values$Name, Data$Name) | (Values$Code, Data$Code))]

That didn't work for me, so I would like to know the right way to do it.

I tried using merge like in other questions

merge(Values, Data,all.x = TRUE)

but in Guanylurea from dataframe Data I get type NA when it should match Urea from dataframe Values. The result for that row would be Type equal to Pharmaceutical, but the Names don't match exactly. So how can I add a partial match into functions match or merge? Or is there an alternative to these two?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
user195366
  • 465
  • 2
  • 13

1 Answers1

0

There is a wide range of answers to this. I left four (dfrx), all using dplyr.

library(dplyr)
df1 <- data.frame(Name = c("gabapentine", "Glyphosate",
                           "Guanylurea", "hydrochlorthiazide"),
                  Type = c("Pharmaceutical", "Pesticide", 
                           "Pharmaceutical", "Pharmaceutical"),
                  Code = c("60142-96-3", "1071-83-6",
                           "141-83-3", "58-93-5"))
df2 <- data.frame(Name = c("gabapentine", "Glyphosate",
                           "Guanylurea", "hydrochlorthiazide"),
                  Value = c(0.2, 1.8, 1.2, 0.5),
                  Code = c("60142-96-3", "1071-83-6",
                           "141-83-3", "58-93-5"))

dfr1 <- df2 %>% 
  dplyr::mutate(Type = df1$Type)
dfr2 <- df2 %>% 
  dplyr::bind_cols(Type = df1$Type)
dfr3 <- df2 %>% 
  dplyr::right_join(df1, by = "Name") %>% 
  dplyr::select(-Code.y) %>% 
  dplyr::rename("Code" = Code.x)
dfr4 <- df2 %>% 
  dplyr::right_join(df1, by = "Code") %>% 
  dplyr::select(-Name.y) %>% 
  dplyr::rename("Name" = Name.x)
> dfr4
                Name Value       Code           Type
1        gabapentine   0.2 60142-96-3 Pharmaceutical
2         Glyphosate   1.8  1071-83-6      Pesticide
3         Guanylurea   1.2   141-83-3 Pharmaceutical
4 hydrochlorthiazide   0.5    58-93-5 Pharmaceutical
bbiasi
  • 1,549
  • 2
  • 15
  • 31