3

I have an interesting issue that I'm struggling to solve.

I am retrieving data from a DB via a web API. I have direct DB access for testing, but my application will need to read the data via the API. In the API I can essentially provide the where clause for the SQL statement that retrieves my data from the table I specify for the web API. I need to retrieve the data in chunks of ~10,000 in order not to overload the server (the web service is not optimised, and gets exponentially slower the more results I retrieve). With most tables, there is a sequential numeric ID field that I can limit my queries with. So the full query looks like this:

SELECT * FROM TABLE WHERE ID > 0 and ID <= 10000

I am only able to provide the where clause component of that query via the API.

ID > 0 and ID <= 10000

For one specific table, the usual ID field is not available. There is a different ID field, which also contains sequential numeric values, but the field is of type String. Since the where clause I provide to the web API is processed as some sort of prepared statement, it will not process any functions. So I cannot cast the ID to an int:

CAST(ID2 as int) > 0 and CAST(ID2 as int) <= 10000

If I use > or < comparison for string values, it follows alphabetical order, and you get annoying behavior, such as 2,000 being greater than 100,000. This breaks the logic I had been using before.

Does anyone have any ideas how I could limit my retrieves to 10,000 entries with this string ID field? I'm sure there is some logic that will do it, but I haven't been able to wrap my head around it.

Implicit casting in the where clause also returns an error:

(ID2 + 0) > 0 and (ID2 + 0) <= 10000

The values in the ID2 field range from ~140,000 to ~3,500,000.

I'd be happy to hear any ideas or suggestions! Please let me know if anything is unclear.

Tom Bascom
  • 13,405
  • 2
  • 27
  • 33
Nik Weiss
  • 422
  • 7
  • 18
  • Does this help? https://knowledgebase.progress.com/articles/Article/P13258 – sean Aug 26 '20 at 15:29
  • Thanks for the tip, but I am essentially trying to do what it recommends in that article: using the where clause to limit my data. The problem is finding a way to use the where clause to reliably only return 10,000 results. – Nik Weiss Aug 26 '20 at 15:35
  • The table you are querying has a serious design problem, because it accepts string inputs when it really should be accepting numbers. Without the ability to cast or at least left pad with zeroes, you are basically out of options here AFAIK. – Tim Biegeleisen Aug 26 '20 at 15:40
  • Yes, there are lots of design problems here, but I have to work with what they provide. I was hoping to think of some logic that uses <> and alphabetical sorting to limit the results, in some creative way. Posting the question here is my last resort. – Nik Weiss Aug 26 '20 at 15:44
  • I am not sure if you are pulling this data back into an application you are coding, but if you are and using a driver like JDBC, you can limit the rows returned in code: java.sql.Statement stmt = conn.createStatement(); stmt.setMaxRows(1000); – sean Aug 26 '20 at 15:48
  • Using JDBC would fix so many of my problems with this interface, but sadly I can only use this webservice, where I can only provide the table name and the where clause. I have no control of any other part of the SQL statement, or the JDBC driver, or anything else. – Nik Weiss Aug 26 '20 at 15:49
  • Why exactly you cannot use the expression with casts (or function, in general)? I understand your sentence _"API is processed as some sort of prepared statement"_ as if the API required you only to provide any SQL snippet with two parameter placeholders (lower and upper bound) such as `id > ? and id <= ?`. Could you please describe what can you send as `where` condition in deeper detail? – Tomáš Záluský Aug 26 '20 at 15:53
  • I don't even have access to documentation that lists the specifics (it probably doesn't even exist), but from my testing it refuses to process any function in the where clause (cast, sub select, etc.). I am essentially limited to LIKE, ><=, parentheses, etc. The exact error I get when I try it via the webservice is: `Error in where clause for "tablename" table: "** Unable to understand after -- ""tablename WHERE"". (247)"` and `Error in where clause for "tablename" table: "PREPARE syntax is: {FOR "` . It seems like that is a prepared statement. – Nik Weiss Aug 26 '20 at 15:57
  • 1
    That error looks very much like it is coming from a QUERY-PREPARE statement via the 4GL engine. Not the SQL engine. The documentation for the WHERE clause is here: https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvref%2Frecord-phrase.html%23 Instead of SQL functions you should be able to use various 4GL functions, such as INTEGER( ID2 ), inside the WHERE clause but there may be performance impacts. – Tom Bascom Sep 11 '20 at 19:08
  • thanks for that insight Tom! I will definitely give it a try on Monday, since I work a lot with this interface – Nik Weiss Sep 12 '20 at 20:02

2 Answers2

2

It might impact performance but the only way I can see of doing this is with a subquery:

SELECT * FROM TABLE WHERE ID IN (
    SELECT TOP 10000 ID
    FROM TABLE
    WHERE [someotherfield] >= [last_someotherfield] -- Might lead to duplicates
    ORDER BY [someotherfield]
)
bendataclear
  • 3,802
  • 3
  • 32
  • 51
  • Thanks for the suggestion, but the API rejects subqueries just like it rejects the explicit casting. – Nik Weiss Aug 26 '20 at 15:51
  • @NikWeiss I have tested using MSSQL and MySQL and I am able to use implicit conversion: `SELECT * FROM TABLE WHERE 0+ID > 0 and 0+ID <= 10000`, could you try this? – bendataclear Aug 26 '20 at 16:01
  • Thanks for the tip, but it gives me a type error when I try to do that. It seems to not want to do implicit conversions at all :( – Nik Weiss Aug 26 '20 at 17:12
2

I think that I have figured this out. By using alphabetical sorting, we can still limit the query to ~11,000 results. We just have to respect that it starts sorting by the first digit/letter, and ignores the place value of the digits in the ID. So a number that is larger by magnitudes of 10, but starts with a lower digit, is treated as smaller:

By alphabetical sorting 9 > 100000

We can therefore use a where clause like this:

ID2 > '100000' and ID2 <= '101000'

That query will include all IDs from 100,000 to 101,000 and from 1,000,000 to 1,010,000. It would theoretically also include IDs from 10,000,000 to 10,100,000 and 100,000,000 to 101,000,000 etc. but I know that my values range from ~100,000 to ~4,000,000 (6 to 7 figures), so that range is irrelevant for this DB. I can then step my way through, all the way to:

ID2 > '999000' and ID2 <= '999999'

This will cover all values from 100,000 to 9,999,999 and let me get all data in 899 steps.

Thanks for all the other ideas provided!

Nik Weiss
  • 422
  • 7
  • 18