-3

I need a function in R that mimics the functionality of LIKE in MySQL.

(I need to validate outcomes of SQL queries and R scripts against each other. If I had a function that exists to mimic the functionality of LIKE, great, that reduces my workload.)

I am adding some of the behaviors of LIKE from the link above. As you can see, there are ways in which LIKE differs from the standard grep regex.

LIKE (description from the link)

  • Pattern matching using SQL simple regular expression comparison. Returns 1 (TRUE) or 0 (FALSE).
  • Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
  • Trailing spaces are significant
  • With LIKE you can use the following two wildcard characters in the pattern. Character Description % Matches any number of characters, even zero characters _ Matches exactly one character
  • In MySQL, LIKE is permitted on numeric expressions. (This is an extension to the standard SQL LIKE.)

    mysql> SELECT 10 LIKE '1%'; -> 1

Soumendra
  • 1,174
  • 1
  • 15
  • 28
  • 3
    I speak R, but unfortunately not MySQL. Since you don't explain what exactly `LIKE` does, there are fewer people able to help you. – Roland Feb 10 '14 at 08:22
  • LIKE doesn't do approximate string matching, so agrep is not really appropriate. I describe the function in greater details above (edited question). – Soumendra Feb 25 '14 at 20:06
  • you are looking for things that do not exist. If you want exactly to go to McDonalds you don't look for alternatives such as Burger King: you just go to McDonalds. So use sql, sqldf or write your own code if you are more happy like that. – RockScience Feb 26 '14 at 09:41
  • I need to validate outcomes of SQL queries and R scripts against each other. If I had a function that exists to mimic the functionality of LIKE, great, that reduces my workload. – Soumendra Feb 26 '14 at 18:52

3 Answers3

4

Try sqldf package. You can write sql-like queries on data.frame

For example:

require(sqldf)
data(CO2)

new.data <- sqldf("select * from CO2 where Plant like 'Qn%'")
Nishanth
  • 6,932
  • 5
  • 26
  • 38
  • And search SO for SQLite syntax: http://stackoverflow.com/questions/7323162/sqlite-like-and – IRTFM Feb 10 '14 at 21:09
  • I am aware of sqldf. However, for valid reasons I can't go into details here, I don't want to write SQL-like queries. If I wanted to do that, I can run the exact SQL queries themselves. – Soumendra Feb 25 '14 at 19:59
  • then what do you want? sorry but it is a bit unclear – RockScience Feb 26 '14 at 09:43
  • I need to validate outcomes of SQL queries and R scripts against each other. If I had a function that exists to mimic the functionality of LIKE, great, that reduces my worklaod. – Soumendra Feb 26 '14 at 18:52
1

try ?grepl or package sqldf

df=data.frame(A=c("mytext_is_here","anothertext_is_here","mytext_is_also_here"),B=1:3)
df

firstSolution = subset(df, grepl("^mytext", A))

library("sqldf")
secondSolution = sqldf("select * from df where A like 'mytext%'")

Source: page 8 of http://cran.r-project.org/web/packages/sqldf/sqldf.pdf

RockScience
  • 17,932
  • 26
  • 89
  • 125
  • sqldf not an option, I don't want to run sql or sql-like queries. grepl is close but doesn't mimic the behaviour exactly. It's also not vectorized (the "pattern" is always atomic). I suppose I'll have to write this function myself. – Soumendra Feb 25 '14 at 20:01
1

I think you could use grepl function in R to do the same. grepl does partial string matching and it will return a logical vector which you could later use to subset data along with other conditions as well.

You could also later use '!' sign in front of grepl to filter out the results having that expression.

ex. sample=c("data","ddata","ddata1")
filtered_data=grepl("dd",sample)
# it gives a logical vector FALSE TRUE TRUE

#and it can be used as follows to find out all the elements that have a string "dd" in it.
sample[grepl("dd",sample)]

Please note that grepl is case sensitive.

Aayush Agrawal
  • 184
  • 1
  • 6