Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of apache-spark (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions

votes

1 answer

Load data from MS SQL table to snappyData

I am using Tibco ComputeDB, which is new to me. It uses sparkDB and snappyData. I want to add data from MS SQL to in memory table of snappyData. I can read data from CSV and load that in snappyDaya with below command. => CREATE EXTERNAL TABLE IF NOT…

sql-server scala apache-spark tibco snappydata

asked Feb 04 '20 at 12:58

JSONX

votes

2 answers

Refresh Dataframe in Spark real-time Streaming without stopping process

in my application i get a stream of accounts from Kafka queue (using Spark streaming with kafka) And i need to fetch attributes related to these accounts from S3 so im planning to cache S3 resultant dataframe as the S3 data will not updated atleast…

apache-spark amazon-s3 spark-streaming apache-spark-sql snappydata

asked Jul 24 '17 at 13:16

shiv455

7,384
19
54
93

votes

2 answers

Are we able to use Snappy-data to Update a record in Azure Data lake ? OR is Azure data lake append only?

I am currently working on azure data lake with snappy-data integration,I have a query on snappy-data are we able to update the data in the snappy-data to azure data lake storage, or we can append only on the azure data lake storage i searched in…

azure apache-spark-sql azure-data-lake snappydata

asked Mar 07 '17 at 06:51

richard a

votes

2 answers

SnappyData - Error creating Kafka streaming table

I'm seeing an issue when creating a spark streaming table using kafka from the snappy shell. 'The exception 'Invalid input 'C', expected dmlOperation, insert, withIdentifier, select or put (line 1, column 1):' Reference:…

sql snappydata

asked Aug 09 '16 at 00:26

mike w

votes

1 answer

Native snappy library not available

I'm trying to do lots of joins on some data frames using spark in scala. When I'm trying to get the count of the final data frame I'm generating here, I'm getting the following exception. I'm running the code using spark-shell. I've tried some…

scala apache-spark dataframe snappy snappydata

asked Apr 08 '19 at 13:39

pkgajulapalli

1,066
3
20
44

votes

0 answers

Unable to setup multi node cluster

I am trying to setup multi node cluster of snappy data locator config : a.com -dir=/snappydatafiles/server1 -heap-size=7096m -locators=b.com:8888,a.com:9999 b.com -dir=/snappydatafiles/server2 -heap-size=7096m -locators=b.com:8888,a.com:9999 server…

snappydata

asked Jan 07 '19 at 07:16

jaimin03

votes

1 answer

Tables created in Snappy-shell/snappy-sql do not show up smart connector mode | java

The sparkcontext is created as below SparkConf sparkConf = new SparkConf().setAppName(args[0]); snappySes = new SnappySession(new SparkSession.Builder().config("spark.snappydata.connection", "localhost:1527").getOrCreate()) Read snappy…

snappydata

asked May 09 '18 at 12:13

satish sidnakoppa

votes

1 answer

How can I change the TTL value of SnappyData table?

How can I change the TTL value for SnappyData table? For Example: If I create table with TTL = 60 seconds: CREATE TABLE APP.TEST (ID INTEGER NOT NULL PRIMARY KEY, TTL INTEGER) USING ROW OPTIONS (PARTITION_BY 'ID', EXPIRE '60') ; How can I change…

snappydata

asked May 08 '18 at 04:01

Rohit Patil

votes

2 answers

Snappydata cannot have an array of size more than 1000

Can anyone help me and provide me info regarding the limit of array length and dimensions a row/column table's row can have. I cannot add more than 1000 elements to my array. Is there any way to increase its size?

pyspark apache-spark-sql snappydata

asked Nov 21 '17 at 13:15

techie95

votes

1 answer

from pyspark.sql.snappy import SnappyContext - ImportError: No module named snappy

even after reinstalling pyspark and snappydata whenever I try to import pyspark.sql.snappy import SnappyContext from the code below: from pyspark.sql.snappy import SnappyContext from pyspark.storagelevel import…

python-2.7 pyspark apache-spark-sql snappydata snappydb

asked Oct 31 '17 at 10:09

techie95

votes

1 answer

SnappyData: Connect Standalone Spark Job to Embedded Cluster

What I'm trying to achieve is similar to Smart Connector mode, but the documentation isn't helping me much, because the Smart Connector examples are based on Spark-Shell, whereas I'm trying to run a standalone Scala application. Therefore, I can't…

apache-spark intellij-idea snappydata

asked Oct 25 '17 at 21:53

Joseph Pride

votes

0 answers

Using SnappyData Embedded Mode Like Local Mode

I'm experimenting with a home cluster, but I would like to debug it in IntelliJ. The SnappyData documentation says that, in Local Mode, I can create a SnappySession inside my driving program, which based on my Spark experience says I can run it…

apache-spark intellij-idea snappydata

asked Oct 25 '17 at 16:04

Joseph Pride

votes

2 answers

How to use REST service in snappy-data sql Database

Hi I am start to learn snappy-data documentation version 0.7 for the purpose of connect to the REST call to access the snappy-data database,I don't know how to use REST service with snappy-data could you tell me anyone how to do that,I searched in…

apache-spark-sql snappydata

asked Mar 04 '17 at 07:33

Karthik GB

votes

1 answer

Does EXPIRE option in SnappyData DDL Syntax do an actual delete?

In this DDL syntax, does EXPIRE mean, SnappyData literally deletes the record(s) after this amount of time? Or, just expires it from local cache/memory, where it can be retrieved again from disk later? CREATE TABLE [IF NOT EXISTS] table_name ( …

snappydata

asked Aug 04 '16 at 19:21

Jason

2,006
3
21
36

votes

2 answers

Are there any limitations to # of columns in a SnappyData column table?

As an example, Cassandra has a 2 billion column limit for a "row key". Some high volume IoT apps could push that boundary and you should therefore, design accordingly, if using Cassandra. Is there any size limitations similar to that with…

snappydata nosql

asked Aug 03 '16 at 22:08

Jason

2,006
3
21
36

2 3

…

8 9 Next