0

I am querying a historical database and the result table should have around 144k rows but I am getting the result limited to 100k lines. I have already seem this problem using ODBC connection to the same DB but using a VBA .NET application.

I tested the same query in the DB SQL client and it return the correct number of row. The only thing that look suspect is that this client has a limit to number of rows shown on its interface and the default value is 100k.

import pypyodbc as pyodbc
import pandas as pd
import gc
import time

def GetTrend(tags,server,t_ini,t_end,period):

    IP21_connection = 'DRIVER={{AspenTech SQLplus}};HOST={};PORT=10014'
    IP21_Query = """select ts,avg from aggregates where name = '{}'
                and ts between '{}' and '{}'
                and period = 10*{}"""

    conn = pyodbc.connect(IP21_connection.format(server))
    cur = conn.cursor()
    df2=pd.DataFrame()
    i=0
    for tag in tags:
        cur.execute(IP21_Query.format(tag,t_ini,t_end,period))
        df = pd.DataFrame(cur.fetchall(),columns=['ts',tag])
        df['ts'] = pd.to_datetime(df['ts'])
        df1=df.set_index('ts')

        df2=pd.concat([df2,df1],axis=1)
        i+=1
        print('{} - {} of {} tags'
              ' collected'.format(time.asctime(time.localtime()), i,
                                  len(tags)), flush=True)

        gc.collect()

For the period I am querying the DB I would expect 144k rows but I am getting just 100k.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dariva
  • 330
  • 2
  • 13
  • If you're seeing the same behavior between two different applications, then it sounds like it might be a setting on the DB. Have you confirmed whether there is a ODBC query row limit imposed by the DBAs for the database? – G. Anderson Jun 04 '19 at 17:18
  • [Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451). Simply build a list/dict of data frames and `pd.concat(df_list)` **once** outside the loop. – Parfait Jun 04 '19 at 17:30
  • You are running iterative queries based on various `WHERE` conditions. How do you know 144K is expected size? Can your calculations be off? – Parfait Jun 04 '19 at 17:33
  • I know that the result should be 144k because It is a historian database of process variables (temperature, flows, etc..) and I have an initial time, an end time and a interval. Anyway, the problem is, as G. Anderson said, in the database configuration. I found a way to increase this limit but reflecting about it is better to keep this limit and perform various selects limited to 100k rows. This would avoid overloading the DB. – Dariva Jun 06 '19 at 16:00
  • @Parfait thank you for the tip about append and concat with pandas. It was something that I really didn't know about. – Dariva Jun 07 '19 at 14:59

1 Answers1

0

Just ran into the same issue myself and found the solution in the manual which you may have found yourself a long time ago! If not change your Driver settings to include 'MAXROWS = x', eg:

IP21_connection = 'DRIVER={{AspenTech SQLplus}};HOST={};PORT=10014;MAXROWS=1000000'
Lee G
  • 3
  • 3