0

Subset a dataframe based on another dataframe. how to deal with overlap in character strings?

DF1
| EvalID  | Transect | size
|RO-94 | A | 2
|RO-93 | A | 2
|RO-92 | AB | 14
|RO-91 | B | 25
|RO-90 | BC | 1
|RO-89 | C | 3
|RO-88 | CD | 1
|RO-87 | CD | 50
|RO-86 | D | 70

DF2
| EvalID  | Transect | depth
|RO-93 | A | .1
|RO-92 | A | 1.1
|RO-90 | BC | 0.5
|RO-89 | C | 2.1
|RO-87 | CD | .01

So I am trying to subset df1 by matching both the evalID and transect columns from df2. by using the following code:

tempdf <- subset(df1, Transect %in% df2$Transect & EvaluationID %in% df2$EvaluationID)

(originally found here subset a column in data frame based on another data frame/list )

However, when I run this code on my data the output contains entries with 'AB' transects when df2 only has entries with transect 'A'. I assume that since 'A' in part of the string 'AB', those are being included but I would like to only keep exact matches if possible. Is there a way to do that?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
EmVa
  • 1
  • It's not a string thing, it's the simultaneity of the matches. This is better framed as a join operation. With `dplyr` you can do `semi_join(df1, df2)`, or in base `merge(df1, df2[c("EvalID", "Transect")])`. – Gregor Thomas May 23 '23 at 18:54
  • semi_join seems to have worked thank you! – EmVa May 23 '23 at 18:59

0 Answers0