3

I have two tables: TableA: id, StringA TableB: id, StringB

After fetching all the rows from TableA I want to match words as close as possible to that in TableB. I'm trying to do something like this:

c.execute('SELECT id, StringB FROM TableB WHERE StringB LIKE "'+stringa+'%" COLLATE NOCASE')
foundrows=c.fetchall()
if (len(foundrows)>0):
    print 'Hmm. Which one...'
    for foundrow in foundrows:    
        print 'looking for:'+stringa+'  found:'+ foundrow[1]

Which is giving me things like:

Hmm. Which one...
looking for:goo  found:good
looking for:goo  found:good 1
looking for:goo  found:good 2
looking for:goo  found:good 1 dk
looking for:goo  found:good 2 dk
looking for:goo  found:Good Friday
looking for:goo  found:goose
looking for:goo  found:Good Friday 1
looking for:goo  found:good sport
looking for:goo  found:good job
looking for:goo  found:good morning
looking for:goo  found:good night
looking for:goo  found:goodbye 1
looking for:goo  found:goodbye e
looking for:goo  found:goodbye

What I really want to do is something like 'SELECT id, StringB FROM TableB WHERE StringB CONTAINS WORD "'+stringa+'"

Whats the best way of achieving "Contains Word"?

willwade
  • 1,998
  • 2
  • 16
  • 22
  • 1
    Why not do `LIKE '% string %'` If you put a space on either end, it should limit it to the actual word. – Leeish May 08 '13 at 21:36
  • What is a "`word`"? Something separated by spaces? If so, then see http://stackoverflow.com/questions/8113782/split-string-on-whitespace-in-python – PM 77-1 May 08 '13 at 21:46
  • 1
    @Leeish - Yes, but it will skip first and last words in a string, unless it has a leading and trailing space. – PM 77-1 May 08 '13 at 21:47
  • possible duplicate of [mysql SELECT LIKE must match whole words only to the variable](http://stackoverflow.com/questions/1568068/mysql-select-like-must-match-whole-words-only-to-the-variable) – Praveen May 09 '13 at 09:52

4 Answers4

9

Build a where clause like this:

where (' ' || StringB || ' ') LIKE '% stringa %'

The spaces around StringB ensure that you catch words at the start and end of the string.

Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
5

You could define a REGEXP function:

import sqlite3
import re
def regexp(expr, item):
    reg = re.compile(expr)
    return reg.search(item) is not None

conn = sqlite3.connect(':memory:')
conn.create_function("REGEXP", 2, regexp)
cursor = conn.cursor()
cursor.execute('CREATE TABLE foo (bar TEXT)')
cursor.executemany(
    'INSERT INTO foo (bar) VALUES (?)', [
        ('this is not good', ),
        ('world of goo', ),
        ('goo: the final frontier', ),
        ('goo',)])
cursor.execute('SELECT bar FROM foo WHERE bar REGEXP ?', [r'\bgoo\b'])
# cursor.execute('SELECT bar FROM foo WHERE (" " || bar || " ") LIKE ?', ["% goo %"])
data = cursor.fetchall()
print(data)

yields

[(u'world of goo',), (u'goo: the final frontier',), (u'goo',)]

Note also that the regex \bgoo\b will match word boundaries that are not spaces. For example if you had a table value of goo:, then \bgoo\b will match it, but " " || bar || " " LIKE "% goo %" will not.


It may be nice to have the power of regex while doing sqlite queries, but it is probably not very fast, since it needs to call a Python function on each item from the table.

Assuming the words you are searching for are not surrounded by punctuation or other word boundaries other than spaces, Blorgbeard's method is probably faster.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • thats pretty neat actually. This is a script that I'm thinking of running once in a while and regex'ing it will give me a bit more power for tweaking. thanks for your time – willwade May 09 '13 at 20:09
1

You can use the glob function (or operator) for case sensitive comparisons. From D. Richard Hipp on the SQLite Users mailing list:

LIKE is case insensitive and uses wildcards '%' and '_'

GLOB is case sensitive and uses wildcards '*' and '?'.

GLOB also allows you to say '[abcd]' to mean any character in the set of "abcd".

LIKE can have an option ESCAPE character for escaping wildcards. GLOB cannot. But with GLOB if you want to match a wildcard character you can use '[*]' or '[?]'.

Other than that they are the same. In fact, GLOB and LIKE are implemented using same subroutine, called with different parameters that determine the wildcards and case sensitivity.

If you really want to match whole words you will need an OR of several tests. E.g.,

WHERE glob(StringB + ' *',        StringA)
   OR glob('* ' + StringB,        StringA)
   OR glob('* ' + StringB + ' *', StringA)

to catch words at the start, end, and middle of StringA, respectively. Or building on Blorgbeard's response

WHERE glob(' '+StringB+' ','* '+StringA+' *')
Doug Currie
  • 40,708
  • 1
  • 95
  • 119
0

You can do this like;

WHERE StringB LIKE "%'+stringa+'%"
Ozan Deniz
  • 1,087
  • 7
  • 22