SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.
From https://github.com/SnappyDataInc/snappydata
SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of apache-spark (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).
Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.
SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.
It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.
Important links: