If you want to improve performance, you will have to use fixed length fields. Parsing or loading variable length fields does not provide a significant increase in performance. Reading by text line involves scanning for the end of line token. Scanning wastes time.
Before using any of the following suggestions, profile your code to establish a baseline time or number for performance. Do this after each suggestion, as it will enable you to calculate the performance delta of each optimization. My prediction is that the delta will become smaller with each optimization.
I suggest first converting the file to fixed length records, still using text. Pad fields with spaces as necessary. Thus, knowing the size of a record, you can block read into memory and treat the memory as an array. This should provide a significant improvement.
At this point, your bottlenecks are still file I/O speed, which you can't really make significant improvements on (because file I/O is controlled by the OS), and scannning / converting text. Some further optimizations are: convert text to numbers and finally converting to binary. At all costs, prefer to keep the data file in human readable form.
Before making the data file any less readable, try splitting your application into threads. One thread handles the GUI, another the input, and another for the processing. The idea is have the processor always executing some of your code rather than waiting. In modern platforms, file I/O can be performed while the CPU is processing your code.
If you don't care about portability, see if your platform has DMA capability (a DMA or Direct Memory Access component allows data transfers without using the processor or minimizing use of the processor). Something to watch out for is that many platforms share the address and data bus between processor and DMA. Thus the one component is blocked, or suspended while the other uses the address and data buses. So it may help or not. Depends on how the platform is wired up.
Convert the key field to use numbers, a.k.a. tokens. Since the tokens are numeric, they can be used as indices into jump tables (also switch statements) or just indices into arrays.
As a last resort, convert the file to binary. The binary version should have two fields: key as token, and value. Haul in the data in large chunks into memory.
Summary
- Haul large blocks of data into
memory.
- Profile before making changes to
establish a baseline performance
measurement.
- Optimize one step at a time,
profiling after each optimization.
- Prefer to keep data file in human
readable form.
- Minimize changes to the data file.
- Convert file to use fixed length
fields.
- Try using threads or multi-tasking
so application is not waiting.
- Convert text to numeric tokens
(reduces human readability)
- Convert data to binary as a last
resort (very difficult for humans to
read & debug).