1

I am trying to build a database for a trading system. The data is Forex Tick data and the structure is extremely simple below is the class I use to create object of data. As you noticed, the class has four properties only. Really simple class, right?

 using System;
 using System.Globalization;

 namespace InteractiveBrokersTradingSystem
 {
    class ForexDataObject
   {
    public ForexDataObject(string pairName, string timeString, double bid, double ask)
    {
        PairName = pairName;

        var span = DateTime.ParseExact(timeString, "yyyy.MM.dd HH:mm:ss.fff", CultureInfo.InvariantCulture) - new DateTime(1970, 1, 1, 0, 0, 0);
        TimeStamp = span.Ticks;

        Bid = bid;

        Ask = ask;
    }

    public string PairName { get; set; }

    public long TimeStamp { get; set; }

    public double Bid { get; set; }

    public double Ask { get; set; }
}

}

Alright, now we read CSV file which save lot of tick data. I made en experiment here: I collect 1 month(2012.01.01 --- 2012.02.02) EURUSD pair tick data which is saved in the EURUSD.csv. The csv file has 2465671 rows. The way I read in the csv is to build ilist as shown below, so now I have 2465671 objects and each saves one tick:

           IList<ForexDataObject> forexObjectList = new List<ForexDataObject>();
            string[] headers = csv.GetFieldHeaders();

            while (csv.ReadNextRecord())
            {
                    var forexDataObject = new ForexDataObject(pairName, csv[0],Convert.ToDouble(csv[1]),Convert.ToDouble(csv[2]));
                    forexObjectList.Add(forexDataObject);
            }

And the CSV files is 137MB now I wanna write these 2465671 object into a Yap file called Forex.YAP and code as below:

        using (IObjectContainer db = Db4oEmbedded.OpenFile(ForexYapFileName))
            {
                foreach(ForexDataObject forexDataObject in forexObjectList)
                {
                    db.Store(forexDataObject);
                }

            }

Statistics about the the storing to db4o database: Time: almost 20minutes!!!! Size of the YAP file is:248MB

Am i doing it in the wrong way?

ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
Wenhao.SHE
  • 1,903
  • 3
  • 17
  • 19

2 Answers2

1

Not to say that your wrong to use db4o, but why not store it in an SQL (MySQL / MS SQL) database? All of your types being stored are supported and it should give much better performance than db4o.

If you're only looking at it locally, you could even consider a MS SQL Compact Edition database.

As for why it's so much bigger than a *.csv file, well I don't pretend to know a lot about the internal workings of how it's all stored, but I imagine that the Yap file is storing a lot more information for each object than just the data itself.

Siyfion
  • 979
  • 1
  • 12
  • 32
  • Thank you for your reply. I totally agree that since db4o has philosophy to save object in c# program directly to the Yap file. It MUST have some overhead. Mean while, But what I heard is that mysql is not very fast? I load exact same data from csv file waiting for 30 seconds. Do you think my sql will bit it? How do you think about the binary file? Thanks in advance/. – Wenhao.SHE Feb 02 '12 at 16:12
  • I'd say that if mySQL **isn't** fast enough, you've got big problems with the way the handling code is written. It depends on whether it's a purely local solution or could be hosted? – Siyfion Feb 03 '12 at 12:49
  • thank you for your post. My plan is to load the data from database(mysql, binary, even csv directly) and save it in the one datastructure in memory then do the backtest. That is all i wanna:) – Wenhao.SHE Feb 04 '12 at 00:38
  • Just another note. Well the overhead for storing objects directly is not bigger than storing stuff in a relational database. However db4o isn't optimized for millions flat objects. Relational databases are optimized for this tabular flat data, so they will do much better for this kind of data. – Gamlor Feb 07 '12 at 17:21
0

I've never used db4o myself, but there seem to be some concerns about its performance with tables containing many rows. See an example from stackoverflow, db4o experiences?. And as @Siyfion points out there is going to be some overhead for storing the object rather than just the data.

Initially, I was going to suggest trying to use multiple threads to increase performance, but this post on the db4o community website suggests that this may not bring any improvement; if you are committed to using db4o the forums there may be more helpful than StackOverflow.

A couple of alternatives, as already suggested by @Siyfion would be:

MySQL

It has been a while since I used MySQL so I can't comment on its performance, but an example of using LOAD DATA IN FILE can be found at this other question Isthere any class in MySql that is like BulkCopy Class in Sql Server 2005.

MS SQL

The other suggested alternative is to use a MS SQL database. You could then use SqlBulkCopy to insert a datatable of the data. A discussion can be found at SqlBulkCopy from a List<> with several useful links. The MSDN documentation can be found here.

Community
  • 1
  • 1
bebleo
  • 326
  • 4
  • 12