11

I'm looking for a drop in solution for caching large-ish amounts of data.

related questions but for different languages:

Close question in different terms:

I don't need (or want to pay anything for) persistence, transactions, thread safety or the like and want something that is not much more complex to use than a List<> or Dictionary<>.

If I have to write code, I'll just save everything off as files in the temp directory:

string Get(int i)
{
   File.ReadAllText(Path.Combine(root,i.ToString());
}

In my cases in index will be an int (and they should be consecutive or close enough) and the data will be a string so I can get away with treating both a POD and would rather go ultra-light and do exactly that.

The usage is that I have a sequence of 3k files (as in file #1 to #3000) totaling 650MB and need to do a diff for each step in the sequence. I expect that to total about the same or a little more and I don't want to keep all that in memory (larger cases may come along where I just can't).


A number of people have suggested different solutions for my problem. However none seem to be targeted at my little niche. The reasons that I'm looking at disk backed caching is because I'm expecting that my current use will use up 1/3 to 1/2 of my available address space. I'm worried that larger cases will just flat run out of space. I'm not worried about treading, persistence or replication. What I'm looking for is a minimal solution using a minimum of code, a minimal usage foot print, minimal in memory overhead and minimum complexity.

I'm starting to think I'm being overly optimistic.

Community
  • 1
  • 1
BCS
  • 75,627
  • 68
  • 187
  • 294

10 Answers10

5

What you really want is a B-Tree. That's the primary data structure that a database uses. It's designed to enable you to efficiently swap portions of a data structure to and from disk as needed.

I don't know of any widely used, high quality standalone B-Tree implementations for C#.

However, an easy way to get one would be to use a Sql Compact database. The Sql Compact engine will run in-process, so you don't need a seperate service running. It will give you a b-tree, but without all the headaches. You can just use SQL to access the data.

Scott Wisniewski
  • 24,561
  • 8
  • 60
  • 89
  • I'm not liking the overhead. See my edits but I could get away with a single in memory array look up and a single disk read per load so the B-Tree is overkill... in my case. – BCS Jan 03 '09 at 02:15
  • One advantage to using the in-proc DB is that it gives you access path independence. When you need to change what you data you store, or what keys you need to access it, you don't need to re-write a big chunk of your app – Scott Wisniewski Jan 03 '09 at 02:43
  • However, if you really feel that the stuff you need to do with the data is that simple, then I would think you could something from scratch that used Dictionary(of int, string), where the string was a file name, in about 2-3 hours of work.... – Scott Wisniewski Jan 03 '09 at 02:46
2

Disclaimer - I am about to point you at a product that I am involved in.

I'm still working on the web site side of things, so there is not a lot of info, but Serial Killer would be a good fit for this. I have examples that use .Net serialization (can supply examples), so writing a persistent map cache for .Net serializable objects would be trivial.

Enough shameless self promotion - if interested, use the contact link on the website.

Daniel Paull
  • 6,797
  • 3
  • 32
  • 41
  • +1 for related stuff but I'm looking more for ultra-light solutions (ideal would be where key & values are both POD and get stored as binary data blocks) – BCS Jan 03 '09 at 02:08
  • SerialKiller is pretty damn light - I'd hate for you to dismiss it for that reason! The interface is basically a mapping from a key (system generated) to a binary stream. – Daniel Paull Jan 03 '09 at 02:14
  • The naive, probably buggy and extendability version of what I'm looking for (skipping the eviction policy stuff) could be done in about 30 LOC. I'd be impressed if you could get even half your feature list in nder that. – BCS Jan 03 '09 at 02:19
  • 1
    By "light" I refer more to runtime overheads, which are very low. I haven't counted LOC, but the DLL's are under 500kb in total, which given the capability, is very lean. – Daniel Paull Jan 03 '09 at 02:50
  • skipping of iteration and recursion (unneeded in this case) LOC ~ execution time (for some values of LOC :) – BCS Jan 03 '09 at 03:56
  • 1
    I disagree. Overcoming issues of fragmentation in the file system and caching strategies greatly affect execution time, so performance may be inversely proportional to LOC! – Daniel Paull Jan 03 '09 at 04:03
2

This is very similar to my question

Looking for a simple standalone persistant dictionary implementation in C#

I don't think a library that exactly fits what you want exists, maybe its time for a new project on github.

Community
  • 1
  • 1
Sam Saffron
  • 128,308
  • 78
  • 326
  • 506
1

Here is a B-Tree implementation for .net: http://bplusdotnet.sourceforge.net/

Luke Quinane
  • 16,447
  • 13
  • 69
  • 88
0

you can use the MS application block with disk based cache solution

BCS
  • 75,627
  • 68
  • 187
  • 294
leora
  • 188,729
  • 360
  • 878
  • 1,366
0

Try looking at NCache here also.

I am not affiliated with this company. I've just downloaded and tested their free express version.

Saif Khan
  • 18,402
  • 29
  • 102
  • 147
0

I've partially poprted EhCache Java application to .NET The distributed caching is not yet implemented, but on a single node, all original UnitTests pass. Full OpenSource:

http://sourceforge.net/projects/thecache/

I can create a binary drop if you need it (only sourcecode is availble now)

Timur Fanshteyn
  • 2,266
  • 2
  • 23
  • 27
0

I'd take the embedded DB route (SQLite, Firebird), but here are some other options:

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
0

I recommend the Caching Application block in the Enterprise Library from MS. That was recommended as well, but the link points to an article on the Data Access portion of the Enterprise Library.

Here is the link to the Caching Application Block:

http://msdn.microsoft.com/en-us/library/cc309502.aspx

And specifically, you will want to create a new backing store (if one that persists to disk is not there):

http://msdn.microsoft.com/en-us/library/cc309121.aspx

casperOne
  • 73,706
  • 19
  • 184
  • 253
0

Given your recent edits to the question, I suggest that you implement the solution noted in your question as you are very unlikely to find such a naive solution wrapped up in a library for you to reuse.

Daniel Paull
  • 6,797
  • 3
  • 32
  • 41