3

SQLDF newbie here.

I have a data frame which has about 15,000 rows and 1 column. The data looks like:

cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...

I wanted to use the package sqldf to loop through the column and pick all values which contain "car" anywhere in their value. However, the following code generates an error.

> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"

There is no unexpected symbol, so I'm not sure whats wrong.

so first, I want to know all the values which contain 'car'. then I want to know only those values which contain just 'car' by itself.

Can anyone help.

EDIT:

allright, there was an unexpected symbol, but it only gives me just car and not every row which contains 'car'.

> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
  Keyword
1     car
ATMathew
  • 12,566
  • 26
  • 69
  • 76

4 Answers4

7

Using = will only return exact matches.

You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.

Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:

sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")

And the following will match "car" or "cars":

sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
Andrie
  • 176,377
  • 47
  • 447
  • 496
3

This has nothing to do with sqldf; your SQL statement is the problem. You need:

dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
  "whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
#    Keyword
# 1     cars
# 2  autocar
# 3 carsinfo
# 4      car
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
2

You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:

#Using @Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))

   Keyword
1     cars
2  autocar
3 carsinfo
6      car
Chase
  • 67,710
  • 18
  • 144
  • 161
0

Very similar to the solution provided by @Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:

df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]

   keyword
1     cars
2  autocar
3 carsinfo
6      car

I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)

Community
  • 1
  • 1
mpalanco
  • 12,960
  • 2
  • 59
  • 67