0

We are migrating data from a Firebird database using Python FDB to Azure SQL using pyodbc. There are many tables, and we could generate a Polybase workflow for each one, that is more work with many benefits.

However, I would like to see if we can write the data to the Azure SQL in 20MB segments through pyodbc.

Is there a way to detect the result set that comes back from FDB to make sure it is below 20MB?

Other than writing each result set (guess at the number of records that would be 20mb) to a file and measuring that, could I do it on the allocated memory instead somehow and then refetch till I get the right size?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Dan
  • 2,209
  • 3
  • 23
  • 44
  • Perhaps `result = crsr.fetchall()` followed by `result_size = sys.getsizeof(result)` …? – Gord Thompson Oct 05 '20 at 19:37
  • Thanks @GordThompson, will try that, found a good thread here about it: https://stackoverflow.com/questions/33978/find-out-how-much-memory-is-being-used-by-an-object-in-python – Dan Oct 05 '20 at 19:53
  • Hmm yeah, ok, it doesn't look promising. After retrieving 1000 rows from a table with 40 bytes per row (thereby expecting about 40_000 bytes), `getsizeof` grossly under-reports the size (9024 bytes) and the pickle method significantly over-reports the size (101_590 bytes). – Gord Thompson Oct 05 '20 at 22:56
  • Hi @Dan, did you get the answer through that case? – Leon Yue Oct 06 '20 at 02:16
  • Hi @LeonYue We decided to generate a Polybase workflow for each table we will migrate since that will be faster and reusable for later. – Dan Oct 07 '20 at 03:15
  • @Dan Congratulations you find a solution. May I post it as answer? – Leon Yue Oct 29 '20 at 06:56

1 Answers1

0

Congratulations you find a solution in the end:

  • "We decided to generate a Polybase workflow for each table we will migrate since that will be faster and reusable for later."
Leon Yue
  • 15,693
  • 1
  • 11
  • 23