1

I have a dataframe that I'm trying to query using the %in% operator and the contents of another dataframe, which has worked perfectly well.
However, what I'm trying to do now, is to also query where the contents of my dataframe are similar to the second dataframe.
Is there a way to combine an %in% and %like% operator?

I've pasted my code using the %in% operator below, which is working as expected:

sessionData <- as_data_frame(sessionData[sessionData$pagePath %in% pageUrls$page_url,])

When using both the %in% and %like%, it only returns data from the first row in the lookup dataframe - is there a better way to query this?

Edit: As requested, I've pasted some reproducible data example data below, as well as a further information on expected outputs:

df <- data.frame("url" = c('url1','url1-variation1','url1-variation2','url2','url2-variation1','url2-variation2','url3','url3-variation1','url3-variation2'), stringsAsFactors = FALSE)
df_lookup <- data.frame("url" = c('url1','url2','url3'), stringsAsFactors = FALSE)

df_out <- as_data_frame(df[df$url %in% df_lookup$url,])

As you can see, when using the %in% operator, it only returns exact matches. What I'm attempting to do, is also return the variations, using a %like% operator, or something similar.

Ed Cunningham
  • 179
  • 1
  • 3
  • 17
  • Could you add some data and a desired output? Also, in your code you are not using both `%in%` and `%like%`, but only one: could you post your not working code or make it clearer your question? – s__ Nov 07 '18 at 12:37
  • 1
    you will get better answers if you follow this : https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – moodymudskipper Nov 07 '18 at 12:37
  • Apologies, further information now added – Ed Cunningham Nov 07 '18 at 13:19

1 Answers1

2

You can use %like% from package data.table :

library(data.table)
df_out <- df[df$url %like% paste0("(",df_lookup$url,")",collapse="|"),,drop=FALSE]
df_out
#               url
# 1            url1
# 2 url1-variation1
# 3 url1-variation2
# 4            url2
# 5 url2-variation1
# 6 url2-variation2
# 7            url3
# 8 url3-variation1
# 9 url3-variation2

Or you could define your own operator:

`%like_any%` <- function(lhs, rhs){
  grepl(paste0("(",rhs,")",collapse="|"),lhs)
}

df_out <- df[df$url %like_any% df_lookup$url,,drop=FALSE]
df_out
#               url
# 1            url1
# 2 url1-variation1
# 3 url1-variation2
# 4            url2
# 5 url2-variation1
# 6 url2-variation2
# 7            url3
# 8 url3-variation1
# 9 url3-variation2
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167