4

Extending this thread - I would just like to know why it's faster to retrieve files from a file system, rather than a MySQL database. If one were to benchmark the two to see which would retrieve the most data (multiple types of data) over 10 minutes - which one would win?

If a file system is truly faster, then why not just store everything in a file system and replace a database with csv or xml?

EDIT 1:

I found a good resource for alternate storage options for java

EDIT 2:

I'm looking for a Java API/Jar that has the functionality of a SQL Database Server Engine (or at least some of it) that uses XML for data storage (preferably). If you know of something, please leave a comment below.

Community
  • 1
  • 1
  • 1
    Do you want to know *whether* or *why*? You've stated both questions but they are mutually exclusive. If you already know one is faster, why are you asking which is faster, and if you don't, why are you asking why? – user207421 May 30 '12 at 22:33

4 Answers4

13

At the end of the day the database does just store the data in the file system. It's all the useful stuff on top of just the raw data that makes you decide to use a database.

If you can replicate the functionality, scalability, robustness, integrity, etc, etc of a database system using CSV and still make it perform faster than a relational database then yes I'd suggest doing it your way.

It'd take you a few years to get there though.

Of course, relational systems are not the only way to store data. There are object-oriented database systems (db4o, InterSystems Cache) and document-based systems (RavenDB).

Performance is also relative to the style and volume of data you are working with and what you intend to do with it - I'm not going to even try and discuss that, it's too open ended.

I will also not start the follow on discussion: if memory is truly faster than the file system, why not just store everything in memory? :-)

This also seems similar to another question I answered a long while ago:

Is C# really slower than say C++?

Basically stuff isn't always done just for performance.

Community
  • 1
  • 1
Adam Houldsworth
  • 63,413
  • 11
  • 150
  • 187
  • 1
    Thanks for this answer. It makes complete sense. What I don't understand, though, is why someone went ahead and made SQL instead of just making different libraries (i.e. java jar's) and using a file system. That way you can use only the things you want to use and cut out the rest. That's a sure way to boost performance and scalability. –  May 30 '12 at 13:02
  • 1
    @ThreaT Someone did SQL because they needed an application to manage data and allow people to create their own data stores. SQL is the framework and runtime of a database, the actual implementation of that is done by the consumer. There comes a point where a simple library running inside of your applications process is not enough and what you really need is a database appliance. – Adam Houldsworth May 30 '12 at 13:18
  • If it was made for the consumer then why do all developers use it? Do you know of any file system java api's out there? –  May 30 '12 at 13:21
  • @ThreaT Developers are the consumer. No I don't, I'm not a Java developer. – Adam Houldsworth May 30 '12 at 13:22
  • 2
    @ThreaT you are mixing different layers: SQL is a quite abstract method to query and structure data. Theoretically it could be used not just by developers, but by analysts, dataminers.. The storage is a different layer - that's why there are that many different database engines that all understand SQL. Just using the filesystem would be faster for single datapoints, but the databases give you unified means to structure, optimize und make complex queries vs your data – cypherabe May 30 '12 at 13:49
  • http://docs.oracle.com/javase/1.4.2/docs/api/java/io/File.html Java has built in filesystem support in the IO library. – whiteatom Jun 16 '12 at 03:47
3

MySQL uses the file system the same as everything else on a computer. To retrieve a single piece of data, or a table of data, there is no faster way that directly from the file system. MySQL would just be a small bit of overhead added to that file system pull.

If you need to do some intelligent selecting, match some rows, or filter that data, MySQL is going to do that faster than most other options. The database server provides you calculation and data manipulation power that a filesystem can't.

whiteatom
  • 1,456
  • 2
  • 14
  • 33
  • 1
    This seems to have been addressed.. but if you want to know the why.. imagine trying to store a phone book with a folder for each entry, and a file for the name and a file for the number. Those millions of files are stored in a single table (and single file) in a database. Databases also can perform search and manipulations in RAM, something far more difficult to do with a basic file system. In summary, it's an easier, more powerful system for organizing data. – whiteatom Jun 16 '12 at 03:46
  • How does SQL get the data without scanning everything? Can't a java library do something similar if there is a native driver in place? –  Jun 18 '12 at 14:30
  • @ThreaT A database performs lookups using indices optimized for searching. It does not scan all the data (if the DBA has done his job well). Yes, a java library can and does do the same thing; there are many databases written completely in java. – Rodney P. Barbati Jul 26 '18 at 16:34
1

When you have mixed/structured data, a DBMS is the only solution. For eg. try to get the people's name, surname and country for all your customers stored into your DB, but only those born in 1981 and living in Rome. If you have this data into files on the filesystem, how do you easily get only the required data without scanning all your files and how do you join returned data?

A DBMS give you much more than that. Many DBMS store data into files.

This abstraction layer will make you retrieve data in a very easily, standard and structured way.

dAm2K
  • 9,923
  • 5
  • 44
  • 47
  • Actually, all databases store data in files. Actually, all stored data is in files. I don't know any other form of storage that does not use files. – Rodney P. Barbati Nov 29 '18 at 21:21
  • Most DBMS can be configured to save data directly on disk, not using a filesystem at all. For example they can use raw device or block device directly bypassing the Operative System VFS layer. The filesystem itself does not save data in files but in blocks. Generally you can think of files like a bunch of data chunks with some metadata associated and some indexing for faster selection. – dAm2K Nov 30 '18 at 01:21
0

The difference is in how the desired data is located.

In a file system, locating the desired data means searching through all existing data until you find it.

Databases provide indexing which results in locating the desired data almost immediately (within ~12 comparisons) regardless of the amount of data.

What we want is an indexed file system - lucky for us, we have them. They are called databases.

Rodney P. Barbati
  • 1,883
  • 24
  • 18