0

I am trying to make a function in Postgres to make my queries faster compared to Django ORM. But the problem I am facing is results are coming when there is no WHERE clause in the query.

This is the function and its call which yields 0 rows:

CREATE OR REPLACE FUNCTION public.standard_search(search_term text, similarity_to integer)
    RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer)
    LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
    similarity integer;
BEGIN
    RETURN QUERY
        SELECT mark.id, mark.app_num, mark.app_for::text, levenshtein($1, focusword.word) AS similarity
        FROM mark
        INNER JOIN focusword ON (mark.id = focusword.mark_id)
        WHERE similarity <= $2
        ORDER BY similarity, mark.app_for, mark.app_num;
END
$BODY$;

select * from public.standard_search('millennium', 4)

This is the function and its call which is giving me results but is slow as the filtering is done in the function call:

CREATE OR REPLACE FUNCTION public.standard_search(search_term text, similarity_to integer)
    RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer)
    LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
    similarity integer;
BEGIN
    RETURN QUERY
        SELECT mark.id, mark.app_num, mark.app_for::text, levenshtein($1, focusword.word) AS similarity
        FROM mark
        INNER JOIN focusword ON (mark.id = focusword.trademark_id)
        ORDER BY similarity, trad.app_for, mark.app_num;
END
$BODY$;

select * from public.standard_search('millennium', 4) where similarity <= 4

Can anyone shed some light on what is actually going wrong here? After this I can work on the performance improvements.

I was unable to perform this via VIEWS as it required at least one parameter, i.e., search_term to be passed into the levenshtein() function.

I was also facing a problem of passing a tuple as a parameter in the function which is again to be used in the where clause like: WHERE mark.class in (1,2,3,4,5)

I was doing this previously via RawSQL feature of Django ORM, but trying to do it here because of the performance improvement gains.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
inthevortex
  • 334
  • 5
  • 20

1 Answers1

1

The identifier similarity is used in a confusing variety of ways. Not all make sense ...

CREATE OR REPLACE FUNCTION public.standard_search(search_term text, similarity_to integer)
  RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer)  -- ①
  LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
   similarity integer; -- ②
BEGIN
   RETURN QUERY
   SELECT mark.id, mark.app_num, mark.app_for::text, levenshtein($1, focusword.word) AS similarity  -- ③
   FROM mark
   INNER JOIN focusword ON (mark.id = focusword.mark_id)
   WHERE similarity <= $2  -- ④
   ORDER BY similarity, mark.app_for, mark.app_num; -- ⑤
END
$BODY$;

① ... as column in the return type defined by RETURNS TABLE - effectively an OUT parameter.

② ... as variable - overruling visibility of the OUT parameter. But why?

③ ... as column alias in the SELECT list.

④ ... in the WHERE clause, which makes no sense. It does not refer to ③ (like you seem to assume). Output names are not visible in the WHERE clause. There you can only refer to input column names - or variables and parameters. Since the variable (hiding the parameter, but that makes no difference here) is NULL, no rows are returned. Ever.

⑤ ... in ORDER BY, which resolves to the column alias defined in ③

This is madness. Even I had a hard time figuring out which is visible where, and I have some experience with this. Avoid naming conflicts like this. Follow some naming convention and use distinct names for parameters, variables, column names and aliases!

Related:

Possible solution

This would make more sense:

CREATE OR REPLACE FUNCTION public.standard_search(_search_term text, _similarity_to integer)
  RETURNS TABLE(obj_id integer, app_num integer, app_for text, similarity integer) AS
$func$
   SELECT m.id, m.app_num, m.app_for::text, levenshtein(_search_term, f.word) AS sim
   FROM   mark      m
   JOIN   focusword f ON m.id = f.mark_id
   WHERE  levenshtein($1, f.word) <= _similarity_to
   ORDER  BY sim, m.app_for, m.app_num;
$func$  LANGUAGE sql;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks a lot for the insight. I have never worked with functions before and was trying to make one out of the documentations and tutorials I found. I will surely try this solution out. – inthevortex Apr 23 '20 at 15:42