-1

Say I have this list comprehension in Python

[f(x, y) for (x, y) in itertools.product(X, Y) if g(x, y)]

where X and Y are lists and f and g are functions acting on members of lists. How does it translate to a SQL query? This is my solution:

SELECT
    do_something(X.column, Y.another_column, X.our_column), 
    do_something_else(Y.that_column, X.that_column, Y.my_column)
FROM
    this_scheme.your_table as X,
    that_scheme.our_table as Y
WHERE
    condition(X.column, Y.my_column) AND
    another_condition(Y.another_column, X.our_column) 

Is it correct? Can it be improved? Is it efficient?

Elmex80s
  • 3,428
  • 1
  • 15
  • 23
  • Efficiency depends on implementation details like indexes, which can typically only be used with certain types of conditions (comparisons between columns, not function calls). – Barmar Oct 04 '17 at 18:43
  • 1
    provide a realistic data sample and the expected result – ScaisEdge Oct 04 '17 at 18:43
  • `(x, y)` implies that there's just one column in each original table, `X.x` and `Y.y`. How does that relate to your SQL query, which lists several columns in each table? What is the "bad" column? – Barmar Oct 04 '17 at 18:45
  • How that `|` link f(x,y) and g(x,y) ? we assume is a conditional to filter the result? – Juan Carlos Oropeza Oct 04 '17 at 19:07

1 Answers1

1

Your code is correct, although in modern SQL we prefer to use explicit JOIN clauses rather than cross products, so it would be better to write it as:

SELECT
    do_something(X.column, Y.another_column, X.our_column), 
    do_something_else(Y.that_column, X.that_column, Y.my_column)
FROM this_scheme.your_table as X,
INNER JOIN that_scheme.our_table as Y
ON condition(X.column, Y.my_column) AND
    another_condition(Y.another_column, X.our_column) 

See INNER JOIN ON vs WHERE clause for discussion about the merits of INNER JOIN over cross products.

If the conditions are actual function calls, this will most likely be very inefficient, since it will not be able to make use of indexes. The DBMS will have to generate the full cross product, then call the functions on all the columns to determine if it should be included in the result set, so it will be O(m * n), where m and n are the number of rows in the tables. But if it's something simple like X.colA = Y.colB, indexes on these columns can be used to optimize the query.

Indexes can typically be used to optimize equality and inequality comparisons (e.g. col1 < col2), and string prefixes (col1 LIKE 'foo%'). You can also create specialized indexes for full-text searching and geographic coordinates.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    Your answer in the question is good provided `f` is defined so that `f(X, Y) = (do_something(X, Y), do_something_else(X, Y))` and `g` is define dso that `g(X, Y) = condition(X, Y) AND another_condition(X, Y)`. Barmar's answer is equivalent except that it doesn't impose any restrictions on the form of `f` and `g` and so is a more direct translation of your mathematical notation and your Python list comprehension. – Patrick87 Oct 04 '17 at 19:36
  • @barmar Thank you, is there a reason your code is to be preferred over mine? In my honest opinion (I just started with sql) my code is more readable – Elmex80s Oct 04 '17 at 19:38
  • 1
    It's just more modern syntax. `JOIN` and `ON` allows you to be clear about which conditions are used to relate the tables, versus the conditions that apply more generally. It's especially useful when you're joining multiple tables, you can see the relationships clearly. – Barmar Oct 04 '17 at 19:40