-2

If I want to use persistence in my program logic and if I can use either File Input/Output or Database, then how does it affects the running complexity of algorithm because both File as well as Database requires huge IO transfers between cpu and secondary storage. So it certainly affects the running complexity. Is my understanding correct?

Well if this is the case which one favors good? File IO or Database?

Winn
  • 413
  • 2
  • 6
  • 17

3 Answers3

2

Using a file or a database (not located purely in memory (RAM)) is certainly slower than operating purely in memory, but only by a constant factor (let's say one operation is 100x faster from memory - no matter how many times we do it, it will always be 100x faster, thus it's just a constant factor of 100).

Asymptotic complexity (big-O, big-Omega, big-Theta, etc.) of course ignores constant factors
(O(n) = O(10000 n)). (I'm sure one of the answers here will give some intuition into this, if need be).

So it doesn't affect the running time complexity.

Whether a file or database will be faster is dependent on multiple factors, among them:

  • Network speed if a non-local database
  • Hard drive speed
  • What type of operations you want to do

For simple write or one-time read operations, a file should, in theory, be faster (but only a little, hopefully), as databases usually persist to files as well, and have some added complexity. For repeated read operations, a database may be a lot faster, as results can be cached in memory, not requiring it to be read from file. For complex operations, databases usually perform better. All in all, databases tend to be preferred, but this is really something to be benchmarked to get accurate results.

Community
  • 1
  • 1
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
2

Whether a database or plain file I/O is better depends on what exactly you do.

Plain files for example perform extremely well for sequential reading and writing. For example, appending a record to a plain file takes constant time, and less than a relational database because of lower overhead.

If you require random access to the file content, plain file IO may still give good performance, but a database starts to make more sense. A database can be indexed, and if your application needs to search records based on different properties a database is definitely the tool for the job.

Joni
  • 108,737
  • 14
  • 143
  • 193
2

You have not given nearly enough information here to properly answer your question.

But let me make a few observations that might help to guide you.

First, File I/O vs Database I/O does not affect algroithmic complexity. However it can have a huge impact on implementation complexity and resulting run times. This is what you are looking to minimize.

If you do not need to search for persisted records the file I/O option should be the most efficient. This assumes pure sequential processing.

As soon as search comes into the picture all bets are off as to which method will be most efficient. Databases can be very fast when tuned properly. File I/O can result in significant overhead when you choose a poor file structure or search mechanism (eg. sequential search through a large file will be much slower than an indexed select).

As a general statement, you should always be able to build a faster system using customized file I/O vs a generic database. One is tuned to a specific application the other is not (no contest). However, the amount of work needed to build a reliable, highly tuned customized file I/O based system will most likely far outweigh the savings in terms of run time and maintenance (the more you write the more you have to maintain). This is why so much of the industry relies on generic databases to manage their data.

My personal preference is to use a database, not because it might be the absolute fastest mechanism but because it will look after transaction integrity for you (ie. provides commit/rollback facilities). Consider the difficulties of managing a pure file based system in the event of an abend. When using files you never know how much of your output was buffered (not yet persisted) when the crash occured. Recovery/restart can be quite complex when using file I/O. A database makes recovery a lot easier - you just have to start processing from the last commit point.

Only resort to file based processing when a database cannot do the job - and there aren't may situations where databases aren't up to it.

NealB
  • 16,670
  • 2
  • 39
  • 60