4

My application has to read the data stored in a file and get the values for the variables or arrays to work on them.

My question is, that which file format will be fast and easy for retrieval of data from the file.

I was thinking to use .xml, .ini , or just a simple .txt file. But to read .txt file i will have to write a lot of code with many if or else conditions.

I dont know how to use .ini and .xml. But if they will better and fast so i'll learn them first, and then i'll use them. Kindly guide me.

Shaharyar
  • 12,254
  • 4
  • 46
  • 66

5 Answers5

5

I will assume what you are indicating here is that raw performance is not a priority over robustness of the system.

For simple data which is a value paired with a name, an ini would probably be simplest solution. More complex structured data would lead you toward XML. According to a previously asked question if you are working in C# (and hence it's assumed in .Net) XML is generally preferred as it has been built into the .Net libraries. As xml is more flexible and can change with needs of the program, I would also personally recommend xml over ini as a file standard. It will take more work to learn the XML library, however it will quickly pay off and is a standardized system.

Text could be fast, but you would be sacrificing either a vast sum of robust parsing behavior for the sake of speed or spending far more man hours developing and maintaining a high speed specialized parser.

For references on reading in xml files: (natively supported in .Net libraries)

For references on reading in ini files: (not natively supported in .Net libraries)

Community
  • 1
  • 1
Ian T. Small
  • 304
  • 2
  • 16
4

if its a tabular data, then probably it is faster to just use CSV(comma separated values) files.

If it is a structured data(like a tree or something) then you can use the XML parser in C# which is faster (but will take some learning effort on your part)

If the data is like a dictionary, then INI will be a better option. It really depends on the type of data in your application

Or if you don't mind an RDBMS, then that would be a better option. Usually, a good RDBMS is optimized to handle large data and read them really quickly.

Aniket Inge
  • 25,375
  • 5
  • 50
  • 78
  • I have many fields, every field has some attributes. I need to update those attributes. Application updates every value in just few seconds. I think its not like dictionary. What it can be called? structured data? – Shaharyar Feb 06 '13 at 20:34
  • @Shaharyar: Do you need to persist those updates to storage every few seconds? Or can you hold them in memory and write them later? How much data are we talking about? – Matt Burland Feb 06 '13 at 20:37
  • @MattBurland Yeah ! I need to update both file, and fields in every few seconds. The data will be more than 100 mb, i can't exactly define it right now... – Shaharyar Feb 06 '13 at 20:42
  • 2
    @Shaharyar your best bet seems to be an RDBMS – Aniket Inge Feb 06 '13 at 20:44
  • @Shaharyar: Without knowing more about your problem, an RDBMS as Aniket suggested might work, or you might even want something NoSQL instead. – Matt Burland Feb 06 '13 at 20:51
  • @MattBurland yes i want something else – Shaharyar Feb 06 '13 at 20:58
  • After getting all of these answers, i think that for my problem, the xml file will be the better option instead of using RDBMS. But will i'll check them both. thanks alot :) – Shaharyar Feb 06 '13 at 21:09
1

If you don't mind having a binary file (one that people can't read and modify themselves), the fastest would be serializing an array of numbers to a file, and deserializing it from the file.

The file will be smaller because data is stored more efficiently, requiring less I/O operations to read it. It will also require minimal parsing (really minimal), so reading will be lightening fast.

Suppose your numbers are located here:

int[] numbers = ..... ;

You save them to file with this code:

using(var file = new FileStream(filename, FileMode.Create))
{
    var formatter = new BinaryFormatter();
    formatter.Serialize(numbers, file);
}

To read the data from the file, you open it and then use:

numbers = (int[])formatter.Deserialize(file);
zmbq
  • 38,013
  • 14
  • 101
  • 171
1

I think that @Ian T. Small addressed the difference between the file types well.

Given @Shaharyar's responses to @Aniket, I just wanted to add to the DBMS conversation as a solution given the limited scope info we have.

Will the data set grow? How may entries constitutes "Many Fields"?

I agree that an r-dbms (relational) is a potential solution far a large data set. The next question is what is a large data set.

When (and which) a DBMS is a good idea
When @Shaharyar says many fields I are we talking 10's or 100's of fields?
=> 10-20 fields wouldn't necessitate the overhead (install size, CRUD code, etc) of a r-DBMS. Xml serialization of the object is far simpler.

=> If, there is an indeterminate number of fields (ie: The number of fields increases over time), he needs ACID compliance, or has hundreds of fields, then I'd say @Aniket spot on.

@Matt's suggestion of NoSQL is also great. It will provide high throughput (far more then required for an update every few seconds) and simplified serialization/de-serialization.

The only downside I see here is application size/configuration. (Even the light weight, easy to configure MongoDB will add 10's of a MB for the DBMS facilites and driver. Not ideal for a small < 1MB application meant for fast easy distribution.) Oh and @Shaharyar, if you do require ACID compliance please be sure the check the database first. Mongo, for example, does not offer it. Not to say you will ever lose data, there are just no guarantees.

Another Option - No DBMS but increased throughput
The last suggestion I'd like to make will require a little code (specifically an object to act as a buffer).
If
1. the data set it small (10's not 100's)
2. the number of fields are fixed
3. there is no requirement for ACID compliance
4. you're concerned about increased transaction loads (ie: Lots of updates per second)

You can also just cache changes in a datastore object and flush on program close, or via a time every 'n' seconds/minutes/etc.

Per @Ian T. Small's post we would use native XML class serialization built into the .Net framework.

The following is just oversimplified pseudo-code but should give you an idea:

public class FieldContainer
{
    bool ChangeMade
    Timer timer = new Timer(5minutes)


    private OnTimerTick(...)
    {
          If (ChangeMade)
             UpdateXMLFlatFile()
    }
}
JFish222
  • 1,026
  • 7
  • 11
0

How fast does it need to be?

txt will be the fastest option. But you have to program the parser yourself. (speed does come at a cost)

xml is probably easiest to implement, as you have xmlSerializer (or other classes) to to the hard work.

For small configuration files (~0,5MB and smaller) you won't be able to tell any difference in speed. When it comes to really big files, txt and a custom file format is probably the way to go. However, you can always choose either way: Look at projects like OpenStreetMap, they have huge xml Files (> 10 GB) and it is still usable.

DasKrümelmonster
  • 5,816
  • 1
  • 24
  • 45