0

I created some mock data.

df1 <- data.frame(Flower = c("Rose", "Sunflower", "Tulip"),
                  Size = c(10, 15, 20))

     Flower Size
1      Rose   10
2 Sunflower   15
3     Tulip   20


df2 <- data.frame(Area = c("CA", "TX", "NY", "NV", "MD", "GA"),
                  Flower = c("Red Rose", "Yellow Sunflower", "Purple Tulip", "Rose", "Rose", "Yellow Tulip"))

  Area           Flower
1   CA         Red Rose
2   TX Yellow Sunflower
3   NY     Purple Tulip
4   NV             Rose
5   MD             Rose
6   GA     Yellow Tulip

What I would like to do is to be able to identify any flower that has "Rose" in the name within df2, and attach the size associated from df1.

This is what the ideal result would look like.

  Area           Flower  Size
1   CA         Red Rose   10
2   TX Yellow Sunflower   15
3   NY     Purple Tulip   20
4   NV             Rose   10
5   MD             Rose   10
6   GA     Yellow Tulip   20

Initially, I was thinking of using a loop somehow along with str_detect(), but I was confused as to how to set up the code for that across 2 data frames. Any help is welcome. I am also not a very good coder, so if you could walk me through your code I would greatly appreciate it.

Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • `df2 <- rename(df2, Flower.Specific = Flower) %>% mutate(Flower = str_extract(Flower.Specific, "Rose|Sunflower|Tulip")) df3 <- join(df1, df2, by = "Flower")` – TobiSonne Aug 09 '23 at 15:14

1 Answers1

0

Using the fuzzyjoin-package, you can join df1 to df2, using the df1$Flower-values as a regex to join with.

library(fuzzyjoin)
df2 |> regex_join(df1, by = "Flower", mode = "left")
#   Area         Flower.x  Flower.y Size
# 1   CA         Red Rose      Rose   10
# 2   TX Yellow Sunflower Sunflower   15
# 3   NY     Purple Tulip     Tulip   20
# 4   NV             Rose      Rose   10
# 5   MD             Rose      Rose   10
# 6   GA     Yellow Tulip     Tulip   20
Wimpel
  • 26,031
  • 1
  • 20
  • 37