6

In particular I would like to call the Postgres levenshtein function. I would like to write the blaze query to return words similar to the word 'similar', ie the equivalent of:

select word from wordtable where levenshtein(word, 'similar') < 3;

In Blaze this should look something like

db.wordtable.word[levenshtein(db.wordtable.word, 'similar') < 3]

but levenshtein is not defined in any module I am importing on the python side. Where/how do I get a levenshtein definition for use in Blaze expressions on the Python side?


I have found the sqlalchemy.sql.func package which provides Python handles for SQL functions for use with SqlAlchemy, but these do not work in Blaze expressions. Is there an equivalent Blaze package, or how can I use sqlalchemy.sql.func.levenshtein inside a Blaze expression?

Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
  • 1
    Note that I'm not familiar with Blaze. However, I expect explicit support for the `levenshtein` function is unlikely as it is not even part of the PostgreSQL core language, but rather an extension from the `fuzzystrmatch` module. So probably what you really need to know is how to extend Blaze expressions to support custom SQL functions. – beldaz Nov 30 '16 at 23:52
  • 1
    http://stackoverflow.com/a/33483264/290182 seems to suggest support for UDFs (and so presumably also extension modules) is not currently available, though eminently possible. – beldaz Nov 30 '16 at 23:55

3 Answers3

2

PyPi is your friend. Searching there finds the python-Levenshtein package. If you're on windows and want a precompiled version, use Christoph Gohlke's wheel (his builds are safe, reliable, and correct; numpy builds use the MKL libraries, so they're fast too!).

cco
  • 5,873
  • 1
  • 16
  • 21
  • I am not after a function that actually implements levenshtein distance in Python. I need a Python function that will get translated by Blaze into levenshtein call inside the generated SQL query. – Daniel Mahler Nov 30 '16 at 06:47
  • Sorry, I got the impression you were looking for an implementation of levenshtein distance in Python. It looks like you will need to use lower-level SQLAlchemy functionality to get this done the way you want. – cco Nov 30 '16 at 07:08
1

If you wish to see these words and perform some easy actions on them you may try to load filtered data to Blaze from Postgres query.

from blaze import data
import sqlalchemy as sa

engine = sa.create_engine('postgresql://...')
result = engine.execute('''select word from wordtable where levenshtein(word, 'similar') < 3;''')
rows = result.fetchall()
wordtable = data(rows)

# Now you may work with wordtable as a blaze table
wordtable[wordtable.word.like('a*')]  # all words starting with 'a'
Eugene Lisitsky
  • 12,113
  • 5
  • 38
  • 59
  • But why would you whant to do that? For me, the key feature of blaze is not to load the full database due to memory issues... I want to make more or less complex queries without the need to write sql functions, but with writing in common python syntax. That's (imo) the user case, not to manipulate already loaded data. – Nico Albers Jan 11 '17 at 09:03
  • I guess sometimes cranking Python interface to SQL doesn't gives you such flexibility and requires more workaround then just give this task to SQL. You haven't to load all the strings to DB, you may load them letter by letter to distribute memory requirements over time. – Eugene Lisitsky Jan 11 '17 at 09:12
1

Blaze is only a Frontend; and restructures and then gives the query to the SQL Backend, which collects the data from your DB.

From the doku:

What operations work on SQL databases? Most tabular operations, but not all. SQLAlchemy translation is a high priority. Failures include array operations like slicing and dot products don’t make sense in SQL. Additionally some operations like datetime access are not yet well supported through SQLAlchemy. Finally some databases, like SQLite, have limited support for common mathematical functions like sin.

Blaze is intended to forget about the backend and have similar / same syntax for all backends. Therefore it only supports common operations. In my opinion, there is no possibility to specify sqlalchemy or postgresql functions for blaze to use, therefore I think this is not possible by now.

... BUT blaze has version 0.10, it's kind of beta with active contribution in the last year. I am confident, this will be implemented in some time. You always can track the changes in the release notes.

Addition: For further explanation on UDFs (User defined functions) in SQL and working with them in blaze, compare the link provided by beldaz (very similar to this question).

Community
  • 1
  • 1
Nico Albers
  • 1,556
  • 1
  • 15
  • 32
  • I do not want to pass my functions to the backeend. The Postgres backed has a `levenshtein` function. My problem is how to construct a blaze query tocall the backend function, or more generally if some backend implements a new function how to refer to it in blaze without getting undefined errors – Daniel Mahler Jan 10 '17 at 20:52
  • Yeah, of course. Maybe I wrote not clearly. I surely meant that you want to pass sql functions through blaze to the sql driver. But I think this is not possible by now... – Nico Albers Jan 10 '17 at 20:54
  • 1
    Thanks, I misunderstood. That does answer my question. – Daniel Mahler Jan 10 '17 at 20:57
  • Could you make the distinction more explicit in your main answer in case somebody else comes here with the same question. – Daniel Mahler Jan 10 '17 at 21:00
  • Of course, I'm working on it by now. Thank you for accepting my answer! – Nico Albers Jan 10 '17 at 21:02