24

Suppose I have the following data:

foo <- data.frame(Company = c("company1", "foo", "test", "food"), Metric = rnorm(4, 10))

> foo
   Company    Metric
1 company1 10.539970
2      foo  9.487823
3     test  9.663994
4     food  9.499327

Why does the following code return 0 results (instead of the second and fourth rows)?

library(dplyr)
library(data.table)

foo %>% dplyr::filter(Company %like% "%foo%")

I'm trying to use the SQL-equivalent wildcard filter on a particular input string to dplyr::filter, using the %like% operator from the data.table package.

What am I doing wrong?

bschneidr
  • 6,014
  • 1
  • 37
  • 52
Ray
  • 3,137
  • 8
  • 32
  • 59
  • 8
    How do you get `%like%` working? Is that a custom defined function or a function from another package? – Konrad Rudolph Sep 28 '15 at 18:19
  • @KonradRudolph Plz see my answer below. I changed it to `foo %>% dplyr::filter(Company %like% "foo")` and it works. – Ray Sep 28 '15 at 20:55
  • 3
    `like` is from `data.table` package – Dhawal Kapil Sep 28 '15 at 20:57
  • @DhawalKapil are you sure `%like%` is from the `data.table` package? I don't have `data.table` install on my system and `%like%` still works on a database back-end `tbl(con, "table_name") %>% filter(Company %like% "foo")`. – Paul Rougieux Sep 25 '17 at 15:49
  • yes see page no 57 here https://cran.r-project.org/web/packages/data.table/data.table.pdf – Dhawal Kapil Sep 26 '17 at 10:07
  • @DhawalKapil those are 2 very different implementations of `like`. In the data table document you cited, `like` is implemented as a "Convenience function for calling grep.", while the [dbplyr reference manual](https://cran.r-project.org/web/packages/dbplyr/dbplyr.pdf) says "All other functions will be preserved as is. R’s infix functions (e.g.%like%) will be converted to their SQL equivalents (e.g.LIKE)." – Paul Rougieux Mar 27 '20 at 14:14

6 Answers6

31

You can use:

filter(foo, grepl("foo", Company, fixed = TRUE))

Output:

  Company    Metric
1     foo  9.906805
2    food 10.464493

As Dhawal Kapil pointed out I think %like% is from data.table:

library(data.table)
DT <- data.table(foo)
DT[Company %like% 'foo']

Output:

   Company    Metric
1:     foo  9.906805
2:    food 10.464493
mpalanco
  • 12,960
  • 2
  • 59
  • 67
23

You can use with library(stringr)

library(dplyr)
library(stringr)
foo <- data.frame(Company = c("company1", "foo", "test", "food"), Metric = rnorm(4, 10))

foo %>% filter(str_detect(Company,"foo"))

as well as any other Regular expression

foo %>% filter(str_detect(Company,"^f")) 
  • 2
    Hadleys says that this never worked as expected and is now removed version dbplyr 1.4.0 https://github.com/tidyverse/dbplyr/issues/295 – user63230 Mar 25 '20 at 12:02
17

Figured it out; figure I'll leave this up in case it helps somebody else in the future:

library(data.table)   # For like function (%like%)
foo <- foo %>% dplyr::filter(Company %like% "foo")

Without the "s around foo in the original question, returns the correct data.frame. Still not sure if you can use the % as an anchor in SQL (e.g. %foo or foo%), but the above works for what I needed it to do.

Adeemy
  • 3
  • 3
Ray
  • 3,137
  • 8
  • 32
  • 59
  • 25
    No one has a clue where you got `%like%` from, btw – Rich Scriven Sep 28 '15 at 21:03
  • 5
    `grepl()` doesn't get translated to SQL, it can only be used on a data frame. `%like%` is useful when dplyr is used on a database back-end. Filtering in the database is useful on a large table, which would be too large to load entirely into R. You can see the SQL statement generated by `dplyr` by calling the `explain()` function. `foo %>% filter(Company %like% "foo") %>% explain()`. – Paul Rougieux Sep 25 '17 at 15:43
  • @PaulRougieux would you know anyway to filter based on letter beginning, e.g. beginning with "f" something like: `dplyr::filter(Company %like% "^f")`. dbplyr no longer supports `str_detect` https://github.com/tidyverse/dbplyr/issues/295 – user63230 Mar 25 '20 at 12:06
  • 2
    @user63230 this might depend on which database back-end you have behind dplyr. `Company %like% "f%"` could be the answer. – Paul Rougieux Mar 27 '20 at 14:17
3

With new updates to stringr 1.5.0 str_like() will make filters like this a breeze.

foo |> filter(str_like(Company, "foo%"))

this will filter the rows as needed.

1

Alternative using dplyr + SQL:

With sql() escaping from dplyr you can put native SQL (depending on your Data Base flavor) directly in the pipe:

library(dplyr)
foo %>% filter(sql("Company LIKE ('%foo%')")
0

First convert that variable into same case for all records

foo$Company <- toupper(foo$Company)

then apply filter

foo <- foo[Company %like% 'foo'] 

This will give the subset where Company could be:

WESTFOO FIRST NATION-LGS,
SPRINGER EUFOO OIL CO,
US FOO,
CITIFOO NA
OTStats
  • 1,820
  • 1
  • 13
  • 22
Pallavi Kalambe
  • 793
  • 8
  • 7