0

I'm building a Python function that creates SQL statements as a string that will later be passed to the database to extract data using a pyodbc connection.

I have trouble finding the right way to handle different combinations of filter arguments in both SELECT and WHERE statement without using multiple if/else statements and without breaking the syntax of the SQL.

Here is a minimal reproducible example:

sqldata.py

import pandas as pd

class SqlData(object):
    def __init__(self, customers=None, last_version=None, attributes=None, age=None):
        self.customers = customers
        self.last_version = last_version
        self.attributes = attributes
        self.default_columns = ["customer_id", "date", "value"]
        self.age = age

    def buildSQL(self):
        default_columns = ", ".join(self.default_columns)

        if self.attributes == None:
            attributes = " "
        else:
            attributes = " ," + ", ".join(self.attributes) + " "

        if self.customers == None:
            customers = " "
        else:
            customers = "customer_id in ('" + "','".join(self.customers) + "')"

        if self.last_version == None:
            last_version = " "
        else:
            last_version = " AND last_version = 'Y'"

        if self.age == None:
            age = " "
        else:
            age = " AND age > " + self.age

        self.sql = "SELECT " + default_columns + attributes + \
                   "FROM myTable " + \
                   "WHERE " + customers + last_version  + age
        return self.sql

__init__.py

from sqldata import SqlData

__main__.py

from sqldata import SqlData

data = SqlData(customers = ["Arthur", "Ben"],
                   last_version = True, 
                   attributes = ["name", "age"],
                   age = "14")

print(data.buildSQL())

If all arguments are given it works just fine and I get the following:

SELECT customer_id, date, value ,name, age 
FROM myTable 
WHERE customer_id in ('Arthur','Ben') AND last_version = 'Y' AND age > 14

But if customers is not given, it makes no more sense:

SELECT customer_id, date, value ,name, age 
FROM myTable 
WHERE   AND last_version = 'Y' AND age > 14

Ideally, when customers is not given, I would like to have the following:

SELECT customer_id, date, value ,name, age 
FROM myTable 
WHERE last_version = 'Y' AND age > 14

All in all, I would like to know the following:

  1. Is it possible to "bypass" the use of those multiple if/else statements ? There must be a better way.
  2. How to handle the case where the customers variable is not given, hence breaking the syntax of the SQL statement?
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Kermit
  • 3,112
  • 2
  • 10
  • 34
  • You are already testing if `customers` are given. Why don't you use the same test to determine whether you should output `AND` or not? – mkrieger1 Jul 21 '21 at 10:36
  • you should consider even more possibilities. like what if `customers`, `last_version`, and `age` are all `None`? then you shouldn't have a WHERE clause at all. I don't really have an answer for you, but my suggestion is you should separate your code to smaller functions, that deal with the different clauses of the query separately, then create a function that combines them in the correct manner – Almog-at-Nailo Jul 21 '21 at 10:45
  • Also, for the if/else statements, take a look at [this answer](https://stackoverflow.com/a/4978745/13354437) – Almog-at-Nailo Jul 21 '21 at 10:48

1 Answers1

2

I have used a dictionary to represent each column. I then use that dictionary to build the list of columns and conditions:

class SqlData(object):
    def __init__(self, customers=None, last_version=None, attributes=None, age=None):
        self.columns = {}
        self.columns["customer_id"] = customers
        self.columns["last_version"] = 'Y' if last_version else None
        self.columns["age"] = int(age) if age else None
        self.default_columns = ["customer_id", "date", "value"]
        self.attributes = attributes

    def buildSQL(self):
        # Build a list of column names
        if self.attributes:
            default_columns = ",".join(self.default_columns + self.attributes)
        else:
            default_columns = ",".join(self.default_columns)
        # Build a list of columns
        opts = []
        for k,v in self.columns.items():
            if self.columns[k]:
                if isinstance(self.columns[k], int):
                    opts.append(f"{k}>{v}")
                elif isinstance(self.columns[k], list):
                    # Put words in single quotes
                    lst = ','.join([f"'{i}'" for i in v])
                    opts.append(f"{k} in ({lst})")
                else:
                    opts.append(f"{k}='{v}'")
        if 0 == len(opts):
            return f"SELECT {default_columns} FROM myTable"
        else:
            return f"SELECT {default_columns} FROM myTable WHERE {' AND '.join(opts)}"

test = SqlData(customers = ["Arthur", "Ben"],
                   last_version = True, 
                   attributes = ["name", "age"],
                   age = 14)
print(test.buildSQL())
test = SqlData(age = 14)
print(test.buildSQL())
test = SqlData()
print(test.buildSQL())

Output:

SELECT customer_id,date,value,name,age FROM myTable WHERE customer_id in ('Arthur','Ben') AND last_version='Y' AND age>14
SELECT customer_id,date,value FROM myTable WHERE age>14
SELECT customer_id,date,value FROM myTable

Note: depending on where the input comes from, doing things this way opens you up to sql injection (see: Little Bobby Tables). Preferably you would use prepared statements but I don't know if Python (or pyodbc) supports that (Does Python support MySQL prepared statements?).

001
  • 13,291
  • 5
  • 35
  • 66