1

I have lot of Forex Tick Data to be saved. My question is what is the best way?

Here is am example: I collect only 1 month data from the EURUSD pair. It is originally in CSV file which is 136MB large and has 2465671 rows. I use a library written by : http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader and it took around 30 seconds to read all the ticks and saved it in 2465671 objects. first of all, whether it is fast enough?

Secondly, is there any way better than CSV? For example, the binary file which might be faster and whether you have any recommendation about any database which is best? I tried the db4o but it is not very impressive. I think here are some overhead to save data as properties of object and when we have to save 2465671 objects in Yap file of db4o.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Wenhao.SHE
  • 1,903
  • 3
  • 17
  • 19
  • We can't tell you if it is fast enough- only you can answer that question. Likewise, we can't tell you whether anything is 'better' or 'worse' than CSV unless we know more about what your requirements are. Even then, this question is probably too broad to get a good answer. – Chris Shain Feb 02 '12 at 16:30

6 Answers6

5

I've thought about this before, and if I was collecting this data, I would break up the process:

  1. collect data from the feed, form a line (I'd use fixed width), and append it to a text file.
  2. I would create a new text file every minute and name it something like rawdata.yymmddhhmm.txt
  3. Then I would have another process working in the background reading these files and pushing then into a database via a parameterized insert query.

I would probably use text over a binary file because I know that would append without any problems, but I'd also look into opening a binary file for append as well. This might actually be a bit better.

Also, you want to open the file in append mode since that's the fastest way to write to a file. This will obviously need to be super fast.

John MacIntyre
  • 12,910
  • 13
  • 67
  • 106
  • I finally decide to use a similar way you just described. – Wenhao.SHE Feb 02 '12 at 21:47
  • But I would compress them into binary thing instead of relational databases. – Wenhao.SHE Feb 02 '12 at 21:48
  • Why binary (files I assume)? Why not a database? I think I'd prefer database since I could aggregate and do other 'set' type of analysis on it, instead of traversing through it for everything. KWIM? BTW-I'd like to talk more about this offline if you're interested. My connection details are in my profile. – John MacIntyre Feb 03 '12 at 00:04
  • 1
    Me too. I prefer to talk with you offline. Already contact you. – Wenhao.SHE Feb 03 '12 at 02:12
1

Perhaps look at this product: http://kx.com/kdb+.php it seems to made for that purpose.

Kit Fisto
  • 4,385
  • 5
  • 26
  • 43
0

I save terabytes as compressed binary files (GZIP) that I dynamically uncompress using C#/.NETs built-in gzip compression/decompression readers.

Srikant Krishna
  • 881
  • 6
  • 11
0

HDF5 is widely used for big data, including by some financial firms. Unlike KDB it's free to use, and there are plenty of libraries to go on top of it, such as the .NET wrapper

This SO question might help you get started.

HDF5 homepage

Community
  • 1
  • 1
fantabolous
  • 21,470
  • 7
  • 54
  • 51
0

One way to save data space (and hopefully time) is to save numbers as numbers and not as text, which is what CSV does.

You can perhaps make an object out of each row, and the make the reading and writing each object a serialization problem, which there is good support for in C#.

Lindsay Morsillo
  • 530
  • 6
  • 19
0

Kx's kdb database would be a great of-the-shelf package if you had a few million to spare. However you could easily write your own column-orientated database to store and analyse high-frequency data for optimal performance.

algolicious
  • 1,182
  • 2
  • 10
  • 14
  • may you give me an example or reference book for the column oriented database you just mentioned? Thank lot – Wenhao.SHE Feb 12 '12 at 14:56
  • by the way, the kx's kdb is quite expensive and designed for institutional player. I guess. – Wenhao.SHE Feb 12 '12 at 14:58
  • 1
    Yeah, it is very expensive and only deep pockets can afford it. I would suggest you write each column as an array and serialize each one to disk (like c1.dat, c2.dat). You then would need to write a query language to filter data out of the table so implement a SQL-like grammar. – algolicious Feb 13 '12 at 08:58