0

Below is a sample of DB table

date          id    name
01.02.11      4     aaaa
21.05.19      5     aaaa
31.12.12      5     aaaa
01.05.15      6     aaaa

In order to query data in the right way (avoiding duplicates), while querying I have to set a 'reporting date' which is the first month day.

The below code gives me the requested results but only for one month.

sql = 'select * from db where date = '01.03.20''


def oracle(user, pwd, dsn, sql, columns):

    # Connection to databases
    con = cx_Oracle.connect(user=user, password=pwd, dsn=dsn, encoding="UTF-8")
    con.outputtypehandler = OutputHandler

    # Cursor allows Python code to execute PostgreSQL command in a database session
    cur = con.cursor()

    # Check Connection
    print('Connected')

    # Create DF
    df = pd.DataFrame(cur.execute(sql).fetchall(), columns= columns, dtype='object')[:]

    print('Shape:', df.shape)

    return df

Question: How can I query Data using CX_Oracle with different reporting date without doing it manually?

There are multiple way to solve this issue directly using SQL. However, the expected solution should use 'a for loop'.

I was thinking about changing the reporting date with

for i in [str(i).zfill(2) for i in range(1,13)]: 
    for j in [str(j).zfill(2) for j in range(0,21)]
           sql = f'select * from db where date = '01.{i}.{j}''
  • For eg: date = 01.01.19

The idea is to query data for this date --> store it within DF

Go to Next month 01.02.19 --> Store it in DF

And so on until reached range 21 or reached last current month (latest date)

If someone has any idea to query data using a loop with cx_Oracle and Pandas for different date thanks for helping!

A2N15
  • 595
  • 4
  • 20
  • Do you want a separate Df for each query? I.e 21 Dfs or one combined df to store all the results from the queries? – TH14 Apr 25 '20 at 02:00

1 Answers1

0

How about something like this

from datetime import date, datetime, timedelta
import calendar

# Choose Start Month
start_month = date(2019, 1, 1)

# Get Current Month
current_month = date(datetime.today().year, datetime.today().month, 1)

# Create list to collect all successfully run queries
executed_sql_queries = []
# Create list for failed queries
failed_queries = []

# Create list to collect dfs
dfs = []

while start_month <= current_month:
    query_date = start_month.strftime('%d.%m.%y')

    sql = f"""select * from db where date = '{query_date}' """

    try:
        df = oracle(user, pwd, dsn, sql=sql, columns)

    except sql_error as e:
        print(e)
        failed_queries.append(sql)
        pass # move onto the next query or you can try re-running the query

    else:
        executed_sql_queries.append(sql)
        dfs.append(df)

    finally:
        # Add one Month to the date for each run
        days_in_month = calendar.monthrange(start_month.year, start_month.month)[1]
        start_month = start_month + timedelta(days=days_in_month)


all_dfs = pd.concat(dfs)

executed_sql_queries:

["select * from db where date = '01.01.19' ",
 "select * from db where date = '01.02.19' ",
 "select * from db where date = '01.03.19' ",
 "select * from db where date = '01.04.19' ",
 "select * from db where date = '01.05.19' ",
 "select * from db where date = '01.06.19' ",
 "select * from db where date = '01.07.19' ",
 "select * from db where date = '01.08.19' ",
 "select * from db where date = '01.09.19' ",
 "select * from db where date = '01.10.19' ",
 "select * from db where date = '01.11.19' ",
 "select * from db where date = '01.12.19' ",
 "select * from db where date = '01.01.20' ",
 "select * from db where date = '01.02.20' ",
 "select * from db where date = '01.03.20' ",
 "select * from db where date = '01.04.20' "]
TH14
  • 622
  • 10
  • 24
  • Using `'{query_date}'` interpolation is a security risk and likely a performance & scalability problem. Use bind variables instead, e.g. see https://stackoverflow.com/questions/14884686/creating-a-pandas-dataframe-from-a-database-query-that-uses-bind-variables – Christopher Jones Apr 26 '20 at 23:35
  • I've never had an issue with this before, i run asynchronous queries every day have never spotted any errors in my query history. – TH14 Apr 28 '20 at 19:39
  • Think more deeply about it and change to use bind variables. – Christopher Jones Apr 28 '20 at 22:42
  • Ok thanks for the suggestion, so then the accepted answer in [/stackoverflow.com/questions/57399784](https://stackoverflow.com/questions/57399784/how-do-i-pass-multiple-parameters-via-pd-read-sql-one-singular-and-another-in-l) is incorrect, judging by the way the tuple input is inserted into the where in clause? – TH14 Apr 29 '20 at 05:40