0

I'd I like to be able to run a set of records/documents in a data store but have those records fresh in RAM at all times. I only need to use about 1.5GB to 2GB of RAM (potentially less). This would be server-based not embedded.

I know in-memory key-value stores will not work for me, because I need rich query capability.

I know MySQL can do memory based tables, but not sure if people use this feature for what I'm thinking about. Can SQL Server pin tables in memory? The entire table?

What other data stores should I be looking at? Map/Reduce style querying would also be fine if fast.

BuddyJoe
  • 69,735
  • 114
  • 291
  • 466
  • The disadvantage of mysql memory tables is that the contents of the table are wiped whenever mysql restarts. – Sam Dufel Feb 09 '12 at 16:46
  • 2
    SQL Server used to allow `dbcc pintable` which prevented flushing a tables pages to disk, but it was removed. If there is no memory pressure on the system or competition from other databases there is no reason for all the tables data to not end up in the buffer cache as a matter of course. – Alex K. Feb 09 '12 at 17:15
  • How strong does the guarantee that the data are always in memory have to be? Does the DB need to be able to prevent virtual memory from being paged to disk by the OS? – Chris Shain Feb 09 '12 at 19:37

4 Answers4

1

SQLLite can run in memory for a relational database solution. For a non-relational database, RavenDB can run in memory.

Craig Wilson
  • 12,174
  • 3
  • 41
  • 45
1

If you can represent your data as a collection of objects, then you can just use Linq as your "rich query" capability and do it all in memory: this will be MUCH faster than pretty much all of the database solutions out there. If you can keep all of your data in memory, then don't even bother with a database.

Note: if you're doing this in .NET, then you'll have to build your project in 64-bit mode because you cannot have a collection that has more than ~1.5 GB of data in RAM on a 32-bit application. If you can't build/run in 64-bit mode, then you might need a database.

Update

I'm not sure I'm following what you're saying: I'm not really sure what a prevalent system is and that would lead me to believe that I'm not using a prevalent system. I also don't know what a "homegrown" object is or what is a "homegrown" .NET system. I'm using just the standard .NET 4.0 and when I say objects, I specifically meant that you write classes which can hold your data. From there on, you load your data in memory (I don't know how you get your data: file, database, network, etc) then Linq does the ORM for you. Thread safety only matters when you're accessing your internal collection of objects and you can do that with a simple lock.

Oh, and if it is homegrown .NET system - are these objects hosted on a server and getting hit by multiple clients?

For the data I'm dealing with, I don't need any kind of rich query capability (neither in memory, nor to get it from a database), that's why we store anything we can't fit into memory into an embedded database (LevelDB). This way, all of our data is contained on disk and when we need it, we take the shortest/fastest route to get it into memory. Going over a network will only slow you down, so you can do that, but only do it when you absolutely have to (i.e. you have a hard constraint that your 2GB data is on a database somewhere other than your local machine).

Kiril
  • 39,672
  • 31
  • 167
  • 226
  • It would be about 500000 objects in .NET. Each object has about 30 properties. Wouldn't that be slow? Or have you done something in this size range before? – BuddyJoe Feb 09 '12 at 21:09
  • 1
    I think that 500k objects is certainly not a lot... keeping it in memory will still be faster than any database solution. I generally work with large data sets, but it wouldn't make sense to compare what I do and what you do without more information. – Kiril Feb 09 '12 at 22:43
  • Are you using something like a prevalent system? something like http://prevayler.org ... or just reading objects in through a data reader and mapping them to objects (homegrown) or using an ORM? – BuddyJoe Feb 10 '12 at 13:38
  • Oh, and if it is homegrown .NET system - are these objects hosted on a server and getting hit by multiple clients? if so, how are you making it thread safe? thanks, Lirik +1 – BuddyJoe Feb 10 '12 at 13:41
  • Updated my answer... btw, what's a "homegrown" .NET system or a "homegrown" object? – Kiril Feb 10 '12 at 14:59
  • Homegrown = meaning a one-off system versus something that multiple people in the community are using like open source or commercial projects. – BuddyJoe Feb 10 '12 at 22:10
  • Oh, now I'm following, but I still don't think it really matters... your fundamental issue seems to be that you have a relatively large set of data and you want to be able to perform rich queries on your data. But at only 2GB, it seems that the best solution is to keep the data in memory and use Linq. The only thing that needs to be "homegrown" for that situation is the .NET `class` which represents your data... everything else is readily available in .NET; – Kiril Feb 10 '12 at 22:17
1

VoltDB is an in-memory ACID-compliant SQL database that is optimized for high velocity OLTP. It runs on 64-bit Linux or Mac OS X, has an open source community edition, and client libraries in several languages.

When you say "rich query capability" it sounds like it could be more of an OLAP use case, but if the queries are of basic complexity, it could be a good fit for VoltDB. Could you share more about your use case?

BenjaminBallard
  • 1,482
  • 12
  • 11
  • It would be a database of clients - approx 500000. We would have to support the - filtering on about 30 fields. Needs to be able to support >, <, between, like %, exact match text (case insensitive) on these records. These types of queries need to be extremely fast. We may also need sum/avg/stddev 'group by' functionality but this could run at more traditional database speeds. Hope that helps. – BuddyJoe Feb 09 '12 at 21:34
  • There are some things you mention that are not currently supported by VoltDB. The VoltDB document "Using VoltDB" has an appendix that lists supported SQL. – BenjaminBallard Feb 22 '12 at 19:04
0

I have to agree with Lirik here. In .Net terms using objects for your data and LINQ to query a collection of them has to be one of the fastest ways to do what you are asking for whilst still having a rich query language at your fingertips.

If you are worried about the size of the collection and if you will be able to hold all of the information in memory then you could have a look at projects such as Memcached to help you.

UPDATE

I have built an example of using Linq with a criteria object to query a list of products (could easily be a Linq to Sql datatable though)

First an example product class:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Linq_Question
{
    public class Product
    {
        public enum Categories
        {
            CatOne,
            CatTwo,
            CatThree
        }

        public int Id { get; set; }
        public string Name { get; set; }
        public Categories Category { get; set; }
        public decimal Price { get; set; }

    }
}

Now an example Product Criteria class

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Linq_Question
{
    public class ProductCriteria
    {
        public Product.Categories? WhereCategoryIs { get; set; }
        public decimal? WherePriceIsGreaterThan { get; set; }
        public decimal? WherePriceIsLessThan { get; set; }
        public string WhereNameContains { get; set; }

        public ProductCriteria()
        {

        }
    }
}

An example repository - using lists

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;

namespace Linq_Question
{
    public class ProductRepository
    {
        private List<Product> products;

        public ProductRepository()
        {
            products = new List<Product>();
            products.Add(new Product() { Category = Product.Categories.CatOne, Id = 1, Name = "Product 1", Price = 100 });
            products.Add(new Product() { Category = Product.Categories.CatTwo, Id = 2, Name = "Product 2", Price = 120 });
            products.Add(new Product() { Category = Product.Categories.CatThree, Id = 3, Name = "Product 3", Price = 300 });
            products.Add(new Product() { Category = Product.Categories.CatOne, Id = 4, Name = "Product 4", Price = 400 });
            products.Add(new Product() { Category = Product.Categories.CatTwo, Id = 5, Name = "Product 5", Price = 500 });
            products.Add(new Product() { Category = Product.Categories.CatThree, Id = 6, Name = "Product 6", Price = 600 });
        }

        public IEnumerable<Product> Retrieve(ProductCriteria criteria)
        {
            return this.products.Where(FilterProducts(criteria));
        }

        private Func<Product, bool> FilterProducts(ProductCriteria criteria)
        {
            Expression<Func<Product, bool>> predicate = PredicateBuilder.True<Product>();
            List<IProductFilter> filters = new List<IProductFilter>();
            filters.Add(new PriceIsGreaterThanFilter());
            filters.Add(new CategoryFilter());

            foreach (var item in filters)
            {
                if (item.IsValidFilter(criteria))
                {
                    predicate = predicate.And(item.ApplyFilter(criteria));
                }
            }
            return predicate.Compile();
        }
    }
}

Notice in the FilterProducts method that a list of filters are looped over each one is checked to see if it is a valid filter given the current criteria object and then applied if required.

Here is the IProductFilter interface and some example filters

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;

namespace Linq_Question
{
    public interface IProductFilter
    {
        bool IsValidFilter(ProductCriteria criteria);
        Expression<Func<Product, bool>> ApplyFilter(ProductCriteria criteria);
    }

    public class CategoryFilter : IProductFilter
    {
        public bool IsValidFilter(ProductCriteria criteria)
        {
            return (criteria.WhereCategoryIs.HasValue);
        }

        public Expression<Func<Product, bool>> ApplyFilter(ProductCriteria criteria)
        {
            return (p => p.Category == criteria.WhereCategoryIs.GetValueOrDefault());
        }
    }

    public class PriceIsGreaterThanFilter : IProductFilter
    {
        public bool IsValidFilter(ProductCriteria criteria)
        {
            return (criteria.WherePriceIsGreaterThan.HasValue);
        }

        public Expression<Func<Product, bool>> ApplyFilter(ProductCriteria criteria)
        {
            return (p => p.Price > criteria.WherePriceIsGreaterThan.GetValueOrDefault());
        }
    }
}

Please note you will need the PredicateBuilder class - found http://www.albahari.com/nutshell/predicatebuilder.aspx

using System;
using System.Linq;
using System.Linq.Expressions;
using System.Collections.Generic;

namespace Linq_Question
{
    public static class PredicateBuilder
    {
        public static Expression<Func<T, bool>> True<T>() { return f => true; }
        public static Expression<Func<T, bool>> False<T>() { return f => false; }

        public static Expression<Func<T, bool>> Or<T>(this Expression<Func<T, bool>> expr1,
                                                            Expression<Func<T, bool>> expr2)
        {
            var invokedExpr = Expression.Invoke(expr2, expr1.Parameters.Cast<Expression>());
            return Expression.Lambda<Func<T, bool>>
                  (Expression.OrElse(expr1.Body, invokedExpr), expr1.Parameters);
        }

        public static Expression<Func<T, bool>> And<T>(this Expression<Func<T, bool>> expr1,
                                                             Expression<Func<T, bool>> expr2)
        {
            var invokedExpr = Expression.Invoke(expr2, expr1.Parameters.Cast<Expression>());
            return Expression.Lambda<Func<T, bool>>
                  (Expression.AndAlso(expr1.Body, invokedExpr), expr1.Parameters);
        }
    }
}

Finally, here is a small console app showing the idea in action:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Linq_Question
{
    class Program
    {
        static void Main(string[] args)
        {
            ProductRepository repo = new ProductRepository();
            Console.WriteLine("Items over 100");
            foreach (var item in repo.Retrieve(new ProductCriteria() { WherePriceIsGreaterThan = 100 }))
            {
                Console.WriteLine(string.Format("Name {0}, Category {1}, Price {2}", item.Name, item.Category, item.Price));
            }
            Console.WriteLine("Items with a Category of Two");
            foreach (var item in repo.Retrieve(new ProductCriteria() { WhereCategoryIs = Product.Categories.CatTwo }))
            {
                Console.WriteLine(string.Format("Name {0}, Category {1}, Price {2}", item.Name, item.Category, item.Price));
            }

            Console.Read();

        }
    }
}

You could expand on this idea to add multiple filters and even work out if the function returned from the IProductFilter should be AND or OR'd to the expression.

The filters could be injected in to the repository - making them easy to change at runtime.

I hope this gives you some ideas.

dscammell
  • 36
  • 3
  • I thought about that. But how do you "query" Memcached. Its only Key/Value correct? – BuddyJoe Feb 10 '12 at 22:08
  • I'm also trying to get my head around moving from a system that builds dynamic SQL to .NET code that branches off into different LINQ statements depending on what search criteria was set. – BuddyJoe Feb 10 '12 at 22:17
  • Not sure about querying Memcached - not something I have looked into in real depth yet – dscammell Feb 11 '12 at 22:09