-1

I’m trying to read data which is generated by another application and stored in a Microsoft Office Access .MDB file. The number of records in some particular tables can vary from a few thousands up to over 10 millions depending on size of the model (in the other application). Opening the whole table in one query can cause an Out Of Memory exception in large files. So I split the table on some criteria and read each part in a different query. But the problem is about middle sized files that could be read significantly faster in one single query with no exceptions.

So, am I on right way? Can I solve the OutOfMemory problem in another way? Is it OK to choose one of mentioned strategies (1 query or N query) based on the number of records?

By the way, I’m using DelphiXE5 and Delphi’s standard ADO components. And I need the whole data of the table, and no joining to other tables is needed. I’m creating ADO components by code and they are not connected to any visual controls.

Edit:

Well, it seems that my question is not clear enough. Here are some more details, which are actually answers to questions or suggestions posed in comments:

This .mdb file is not holding a real database; it’s just structured data, so no writing new data, no transactions, no user interactions, no server, nothing. A third-party application uses Access files to export its calculation results. The total size of these files is usually about a few hundred MBs, but they can grow up to 2 GBs. Now I need to load this data into a Delphi data structure before starting my own calculations since there’s no place for waiting for I/O during these calculations.

I can’t compile this project for x64, it’s extremely dependent on some old DLLs that share same memory manager with main executable and their authors will never release an x64 version. The company hasn’t yet decided to replace them, and it won’t change in near future.

And, you know, support guys just prefer to tell us “fix this” rather than asking two thousand customers to “buy more memory”. So I have to be really stingy about memory usage.

Now my question is: Does TADODataSet provide any better memory management for fetching such amount of data? Is there any property that prevents DataSet from fetching all data at once?

When I call ADOTable1.open it starts to allocate memory and waits to fetch the entire table, just as expected. But reading all those records in a for loop will take a while and there’s no need to have all that data, on the other hand, there’s no need to keep a record in memory after reading it since there's no seeking in rows. That’s why I split table with some queries. Now I want to know if TADODataSet can handle this or what I'm doing is the only solution.

saastn
  • 5,717
  • 8
  • 47
  • 78
  • 1
    Whenever someone needs an entire table full of data, that's usually a design issue: you should seriously question that 'need' first instead of spending your time on technical issues resulting from these large table sizes. – Jan Doggen Nov 20 '15 at 16:01
  • The data is analysis results generated by a third-party application and .mdb file is just the storage that it offers to transfer the results. Here I need the entire data to make further mathematical calculations and I need it on RAM instead of HDD. What design do you suggest @JanDoggen? – saastn Nov 20 '15 at 16:26
  • It will depend on your math calculations. But then it becomes more a question of statistical or numerical analysis then a programming issue. Maybe you really can't break down your dataset, we can't see that. My remarks came more from business applications, where this often happens. – Jan Doggen Nov 20 '15 at 16:31
  • 2
    I think this is too broad. This sounds like a big data problem, and how you solve a big data problem is intimately related to exactly *what you are doing* with that big data. By the time we have all of that information, this essentially turns into a consultancy question and that's not really what Stack Overflow is about... – J... Nov 20 '15 at 17:13
  • How big is the .MDB file? Have you tried compiling 64 bit application instead of 32 bit which in Delphi is by default? – SilverWarior Nov 20 '15 at 19:23
  • Obvious answer: Buy more memory. http://yourdatafitsinram.com/ – Rob Kennedy Nov 20 '15 at 19:48
  • You have to work out how much memory you need, and whether you have enough, and enough address space. – David Heffernan Nov 20 '15 at 21:43
  • you're right @J..., I tried to clarify my question. – saastn Nov 21 '15 at 21:52
  • @SilverWarior, I added more details, please read the Edit section. – saastn Nov 21 '15 at 21:59
  • That edit wasn't stellar – Drew Nov 21 '15 at 22:11
  • @saastn You say that you need to load that data into a Delphi data structure. Is that Delphi data structure different than the data structure contained in the .MDB file? If it is then you shouldn't even try to load whole .MDB file at once because at some point during the conversion process you would probably be using twice the amount of memory. So instead you should go load and convert each .MDB data record at one time. You should also consider of creating a custom file to store your data structure so you can convert data once and then simply load it from that custom file. ... – SilverWarior Nov 22 '15 at 12:31
  • ... After that you should analyze your calculation process to determine which data records need to be present in the memory during each individual calculation and develop mechanism which will take care of loading and unloading of specific data records into and from memory. Sure this would affect the max performance due to HDD I/O bandwidth limitation. But if you develop your data management model good enough you can instead of loading separate data records load them in batches. And while you are doing calculations on one batch of data you might be loading another in separate thread. ... – SilverWarior Nov 22 '15 at 12:37
  • ... If the limitation is the total memory available and upgrading of it is out of the question then you need to find a way to reduce the memory consumption during the process and most likely sacrificing some performance in the process. – SilverWarior Nov 22 '15 at 12:40
  • @SilverWarior Thank you very much, but I think we’re in wrong way. It’s not like I’m designing a new program. It has passed 10 years from its first release and it’s in expanding phase now. All those parts you motioned, like converting from .MDB to Delphi data structure, creating a custom binary file format and calculation process have been already implemented. This question is all about a work item that says: “OOM meanwhile importing new model” with a 1.5GB .mdb file attached to it! ... – saastn Nov 22 '15 at 23:15
  • ...As I said I’ve already solved it by splitting the table. I also know that it’s going to be a memory usage-performance trade off and I don’t expect a happy end. I just don’t know if it’s my responsibility to handle this or I should just configure ADODataSet in a specific manner. I’m in doubt because I believe what I’m facing is not an uncommon situation and DB engine can handle it way better than I do... – saastn Nov 22 '15 at 23:16
  • ...By the way, data structures in Delphi and .MDB files are pretty much like each other, but Delphi side structure consumes less memory. These .MDB files are not normalized and use entity names instead of IDs!! For example in the table that I have problem with, there are 8 floats and 3 strings, that I keep floats, but I need strings to identify each record. – saastn Nov 22 '15 at 23:16

1 Answers1

0

I did some try and errors and improved performance of reading data, in both memory usage and elapsed time. My test case is a table with more than 5,000,000 records. Each record has 3 string fields and 8 doubles. No index, no primary key. I used GetProcessMemoryInfo API to get memory usage.

Initial State

Table.Open: 33.0 s | 1,254,584 kB  
Scrolling : +INF s | I don't know. But allocated memory doesn't increase in Task Manager. 
Sum       : -      | -

DataSet.DisableControls;

Table.Open: 33.0 s | 1,254,584 kB  
Scrolling : 13.7 s | 0 kB
Sum       : 46.7 s | 1,254,584 kB  

DataSet.CursorLocation := clUseServer;

Table.Open: 0.0 s  | -136 kB  
Scrolling : 19.4 s | 56 kB
Sum       : 19.4 s | -80 kB  

DataSet.LockType := ltReadOnly;

Table.Open: 0.0 s  | -144 kB  
Scrolling : 18.4 s | 0 kB
Sum       : 18.5 s | -144 kB  

DataSet.CacheSize := 100;

Table.Open: 0.0 s  | 432 kB  
Scrolling : 11.4 s | 0 kB
Sum       : 11.5 s | 432 kB  

I also checked Connection.CursorLocarion, Connection.IsolationLevel, Connection.Mode, DataSet.CursorType and DataSet.BlockReadSize but they made no appreciable change.

I also tried to use TADOTable, TADOQuery and TADODataSet and unlike what Jerry said here in comments, both ADOTable and ADOQuery performed better than ADODataSet.

The value assigned to CacheSize should be decided for each case, not any grater values lead to better results.

Community
  • 1
  • 1
saastn
  • 5,717
  • 8
  • 47
  • 78