6

One of my need is to manage the shared resource (more like a log , with both read and write operation)

among different processes (thus multiple threads also) with in an application. The data should also be

persisted with system restarts hence it should be a physical file/database.

The shared resource is some data which has a key , value information. (so possible operation which can be done with this shared resource is add a new key value info ,

update/delete the existing key value info).

Hence I am thinking about using a xml file to store the info physically , and the sample content will

look like ,

<Root>
   <Key1>Value</Key1>
   <Key2>Value</Key2>
   <Key3>Value</Key3>
</Root>

The interface to do the read and operation will look like ,

    public interface IDataHandler
    {
       IDictionary<string,string> GetData();
       void SetData(string key,string value);
    }

I could assume that the data will not cross more than 500 MB hence the xml decision and if the data grows I will move it to DB. Also , writing of data will be more when compared to read operation.

Few queries/design considerations related to the above scenario are,

Is this fine to handle 500 MB of data in a xml file ?

Assuming the file as xml , Now how to take care of the performance consideration ?

  • I am thinking about caching (MemoryCache class in .Net) the data as Dictionary , this will enable

to achieve the performance during read operation , Is it ok to cache 500 MB of data in-memory or do we

have some other option?

  • Now , if I use the above cache mechanism , what should happen during the write operation :

  • Should I write the dictionary content into xml again during every write operation by converting the

whole dictionary to xml ? or - is there any way to update only portion of the xml file whose data is getting modified/added ? or any

other way to handle this scenario ? - Should I again improve the performance by putting the write operation into Queue and in a background

thread read the queue and enable the actual write operation so that the one who actually writes the data

will not get affected because of write to file ? - To handle multi-thread scenario , planning to use Mutex with global name , is there any other

better way to do it ?

I am sure , I am operation with few assumption and tried to built from there and if I am wrong with

certain assumptions then it would change most of the design concept. Hence entirely new solution is also

welcome(keeping performance as main criteria). Thanks in advance.

Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
srsyogesh
  • 609
  • 5
  • 17
  • This all sounds very elaborate and error-prone. Why not use a database? Databases solve problems such as persistence, consistency, crash-consistency, backup, high-availability. – usr Aug 16 '14 at 20:21
  • I will keep that as an option but rather than straight away thinking about database i would also like to know about other options its advantages and disadvantages hence started with xml/text file . If you feel this file based approach is error-prone can you please explain it bit more explaining how , atleast as i expected this would be a new learning for me. – srsyogesh Aug 17 '14 at 19:35
  • @usr: Forgot to mention that , to start of - the xml size wouldnt be more than 1MB or so but can grow marginally over the time like in couple of years . I dont want to invest in DB solution in the initial period itself but have to keep my design opened to support that in the future. Still do you think i am doing/assuming something wrong ? – srsyogesh Aug 18 '14 at 07:12
  • Well, you have to have an answer for all the issues I mentioned. That takes time and is error prone to build. For example, what if your process dies (bug, bluescreen, power-loss) while you write the new database version? Now you have half of a database. The rest is gone. You can make this work safely but why take the burden? You also talkabout threading and queueing. You are getting into dangerous territory there. This is OK as a learning project but for a business project this is the wrong solution. Chose one that is easy to get right and safe by construction. – usr Aug 18 '14 at 11:54
  • Is there any special reason to use XML file? – Kuba Wyrostek Aug 25 '14 at 11:56
  • I recommend this article which, by some luck, relates specially to XML files as data storage. http://www.joelonsoftware.com/articles/fog0000000319.html – Kuba Wyrostek Aug 25 '14 at 11:58

7 Answers7

3

As you said "write operation is more than read" I assume that the data grows much faster so my suggestion is to start of designing for Database. It does not require a full feature database like MSSQL or MYSQL, you can start of with SQL-Lite or MSSQL-Compact. This make your app future proof for the large data handling capacity.

Storing heavy read data like configurations which won't change much in RAM is the efficient way. My suggestion is to use some cache managers like MemoryCache or Enterprise Library Caching Block, this save you lot of time implement thread safe data access and nightmares :) instead of writing your own.

public interface IDataHandler
{
   IDictionary<string,string> GetData();
   void SetData(string key,string value);
}

public class MyDataHandler : IDataHandler
{
   public IDictionary<string,string> GetData()
   {
       return CacheManager.GetData("ConfigcacheKey") as IDictionary<string,string>;
   }

   public void SetData(string key,string value)
   {
       var data = GetData() ?? new Dictionary<string,string();
       if(data.ContainsKey(key)) data[key] = value;
       else data.Add(key,value);

       CacheManager.Add("ConfigcacheKey", data);

       // HERE write an async method to save the key,value in database or XML file
   }
}

If you are going with XML then you do not need to convert the dictionary to xml every time. Load the XML document in XmlDocument/XDocument object and use XPath to find the element to update the value or add a new element and save the document.

From performance point unless you do some crazy logic or handle huge (i mean very huge) data in GB's I recommend you to finish your app quickly using the already available battle tested components like Databases, CacheManagers which abstracts you from thread safe operations.

cackharot
  • 688
  • 1
  • 6
  • 13
2

I see two possible approaches to this problem:

  • Use of a database. IMO this is the preferred approach, since this is exactly the thing that databases are designed for: concurrent read/write access by multiple applications.
  • Use a "service" application that will manage the resource and can be accessed (Pipes, Sockets, SharedMem, ...) by other applications.

Critical points to remember:

  1. GlobalMutex doesn't work across multiple machines (The XML file may lie on a Network share. If cannot rule that out as "unsupported" then you shouldn't use a Mutex).
  2. "Lock File" can leak locks (e.g. If the process that created the Lock file is killed, the file may remain on the disk)
  3. XML is a very bad format if a file is repeatedly updated by multiple processes (e.g. if you need a "load-update-write" for each access this will have very poor performance).
Daniel
  • 1,041
  • 7
  • 13
1

Base your solution on the design principles of this Stackoverflow answer:

How to effectively log asynchronously?

As you mention in one of your considerations, the above solution involves threading and queueing.

Also, instead of serializing the data to XML, you can probably get better performance using BinaryFormatter

Community
  • 1
  • 1
John Jesus
  • 2,284
  • 16
  • 18
  • Well , i haven't decided about writing part yet . This is where i am not sure what is the best way to do it. Few options i had in mind was to use XmlDocument/XDocument/Serialization to do the writing. A new item also gets added to it as suggested by you :) Will check this also . Thanks. – srsyogesh Aug 16 '14 at 19:59
  • Write an answer on how you go with this one or accept the above answer to inform the rest of us. – John Jesus Aug 17 '14 at 13:40
  • Yeah sure , waiting for any comments or proposals from other experts. – srsyogesh Aug 17 '14 at 19:32
1

Regarding performance - XMLs are very slow when size goes beyond 100MB. My requirement was to read/write data (~ 1GB) on the disk, read n write operation may be parallel. e.g. Data is coming from 1 thread and it is being written in the file and another/same application can demand data for display purpose on the chart/other UI. We moved to binary reader writer, we did performance analysis and binary reader/writer were very fast in comparison of XML(for greater file sizes).

Now we have moved to HDF5 and we are playing with 20GB data files with simultaneous read and write operations.

Mutex with global name shud work, we used the same.

Aniruddh
  • 11
  • 1
  • Did you by any chance used caching in this scenario ? also how did you take care of updating only portion of data or is it something related to reading the whole data and wring whole data from/into the file ? do you have the result for your performance analysis ? Would be of great help if you could update your answer with code snippet ? – srsyogesh Aug 25 '14 at 13:01
1

I'd start with single, lightweight governor process which is solely responsible for accessing the data file. Other processes communicate with governor (i.e. via .NET Remoting in this scenario through IDataHandler interface) and never manipulate the file directly. This way not only do you abstract away issues related to multiaccess but gain a few features:

  • lightweight, simple process is much more reliable and does not damage your data in case any of "consumer" processes fails
  • you have a single code to maintain things such as reliability, locking, sharing etc.
  • whenever you decide to switch XML to something else - there is only a single place to change technology
Kuba Wyrostek
  • 6,163
  • 1
  • 22
  • 40
  • The features which you have mentioned is already in my mind , i.e the class name (as part of Single responsibility principle) which implements the interface would look like XmlDataHandler. I am quite ok with the design , but not sure how to tackle performance and reliability issues with respect to xml load , update and write operations and the best practices associated with it.Would be helpful if you also give some insights with respect to those things. Thanks !! – srsyogesh Aug 25 '14 at 11:39
  • Well ! many experts have already suggested to use DB which abstracts the problem which i am thinking about while using xml , still if i get some concrete answer i.e solution to the problem while using xml then it would be of great help , because if there is no way then i can go to DB else somehow implement what is needed to achieve everything with xml file itself. – srsyogesh Aug 25 '14 at 11:41
1

Database, no question about it.

If you are balking at creating another server, just use SQLCE on a shared file on a network drive (so long as you don't need more than 256 concurrent connections).

No huge database to support, but you get strongly typed data and all the other good things that come from using a database, like indexes, hashes, rowversions, etc.

If nothing else, it keeps from having to do a linear scan of the entire file every time you want to find (or update, or delete, or even add, if you want unique keys) a record.

You are literally writing a hash-table, mapping keys to values. Don't use the data-storage equivalent of an array of tuples. Use a real permanent store.

The only advantage you have with an XML file (if that's even possible to use well) is human readability and editability (if that's even a bonus... is SSMS that hard to use)?

Disadvantages:

1) Linear scan for all queries 2) No security or password access at the application level... anyone could edit this XML file. SQLCE can be encrypted and password locked. 3) Untyped data. 4) Verbose format (seriously, JSON would be better, faster, smaller, typed, and human readable). 5) SQL > XPath/XSLT 6) If your data requirements grow, you have built-in constraints and keys.

I can't think of a more performant solution with less overhead than a SQLCE instance.

Clever Neologism
  • 1,322
  • 8
  • 9
1

First things first. You have to forget about using XML for high performance systems. I would suggest go for JSON. Its light weight and many high performance demanding applications like Foursquare use JSON to store their data (Though not all of their data).

Better try one of the NOSQL document based database rather than going for Relational DBs since they are exclusively designed for high performance system and few of them can save raw JSON format data. I would suggest go for MongoDB (has C# driver and supports LINQ). There are many other document based NOSQL DBs. But I haven't used them.

For concurrency, you can use one of the Concurrent Collections, particularly ConcurrentDictionary<TKey, TValue> so that you don't have to worry about synchronization issues.

Nicky
  • 75
  • 7