How to use Python to iterate through tables of an SQL database

Question

Goal: To access a database of hundreds of tables, iterate through each table's columns to find a specified column name and store the name of the table in a list.

Overview: I setup access to a database and stored the names of all the tables in a variable.

import pandas
import pyodbc

main_path = 'my_database_name'

with pyodbc.connect("DRIVER={SQL Server};" +
    "SERVER=MyServerName;" +
    f"DATABASE={main_path};" +
    "Trusted_Connection=yes;") as main_conn:

db_table_names = [_.table_name for _ in main_conn.cursor().tables()]

I then tried to iterate through the list of table names, retrieve the table and see if the column name I'm looking for is in the columns.

col_name ='whatever_col_i_need'
table_matches = [table_name if col_name in pandas.read_sql(f"SELECT * FROM {table_name }",
                con=main_conn).columns else None for table in db_tables]

Problem: I think this sucks up a lot of resources. It may be 'holding on' to each of the tables as it iterates, or is just generally slow. I thought about using a with statement so it would connect to the table, look for the column, then disconnect from the table, but I'm not sure if that's the best way, plus I don't think I'm using the right format.

found_tables = []

for table_name in db_table_names:
    with pandas.read_sql(f"SELECT * FROM {table_name}", con=main_conn) as temp_tbl:

    if 'LOCATION' in temp_tbl.columns:
        found_tables.append(table_name)

UPDATE: I figured out a method that is super fast and meets my needs perfectly.

Since I don't really care what information is in the table and just want to confirm that the column name exists, there's no need to store the table data, even if temporarily. By setting up a try/except and having a query to get a column, if the column doesn't exist in the table then it hits the except and loops.

found_tables = []

for table_name in db_tables:
    try:
        # Print statement to just follow the progress
        print(f'Checking: {table_name} for {col_name}...')
        pandas.read_sql_query(f"SELECT {col_name} FROM {table_name}", con=main_conn)
        found_tables.append(table_name)
    except Exception as e:
        print(e)
        pass

If you have hundreds of similar tables, this is a poor database design. Dynamic information should be in table data, not table and column names. — Barmar, Apr 09 '23 at 16:36
@markalex The problem is not getting the column names, the problem is how to iterate through tables in a database without taking forever or sucking up resources. — spareTimeCoder, Apr 09 '23 at 16:40
The reason your code is so slow is because you're using `SELECT *`. That's a very expensive query to execute for each table. If you just used a simple `INFORMATION_SCHEMA` query to get the column names you wouldn't have to fetch enormous amounts of data like this. — Barmar, Apr 09 '23 at 16:57
You can do it in a single query: `SELECT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME IN ('table1', 'table2', ...) AND COLUMN_NAME = 'LOCATION'` — Barmar, Apr 09 '23 at 16:59
@Barmar I'll try that last suggestion of yours and see if it speeds things up. Thanks! — spareTimeCoder, Apr 09 '23 at 17:23
@Barmar That might work in SQL, but I don't think it will in Python — spareTimeCoder, Apr 09 '23 at 17:36
@spareTimeCoder Why won't it work in Python? It sends the query to SQL. — Barmar, Apr 09 '23 at 20:11
@Barmar The information_schema part doesn't work. I figured out a solution anyway. Thanks though. — spareTimeCoder, Apr 10 '23 at 11:12
I'm not sure why it doesn't work, but I've reopened so you can post your solution as an answer. — Barmar, Apr 10 '23 at 14:56

score 0 · Answer 1 · answered Apr 12 '23 at 11:54

The key really is not saving the query results to a variable, especially when not knowing how large the table could be in the response. Secondly, the try/except was an easy way of handling whether or not the table was found to have the column name, since it could be assumed that the query would fail if it wasn't there, so a simple pass did the trick. May seem inelegant but it iterated through hundreds of million-plus line tables in seconds and returned the results of the found tables.

import pandas
import pyodbc

main_path = 'my_database_name'

with pyodbc.connect("DRIVER={SQL Server};" +
    "SERVER=MyServerName;" +
    f"DATABASE={main_path};" +
    "Trusted_Connection=yes;") as main_conn:

db_table_names = [_.table_name for _ in main_conn.cursor().tables()]
found_tables = []
col_name = 'SomeCol'

for table_name in db_tables:
    try:
        # Print statement to just follow the progress
        print(f'Checking: {table_name} for {col_name}...')
        pandas.read_sql_query(f"SELECT {col_name} FROM {table_name}", con=main_conn)
        found_tables.append(table_name)
    except Exception as e:
        # Print statement can be omitted if there's just too much printing out
        print(e)
        pass

print(f"Found Column name in: {found_tables}")

How to use Python to iterate through tables of an SQL database

1 Answers1