Questions tagged [shark-sql]

Shark has been subsumed by Spark SQL. It was an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

Shark has been subsumed by apache-spark-sql. It was an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

59 questions

votes

5 answers

How to make shark/spark clear the cache?

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result. Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k…

asked Dec 11 '13 at 11:19

venkat

votes

3 answers

Is LIMIT clause in HIVE really random?

The documentation of HIVE notes that LIMIT clause returns rows chosen at random. I have been running a SELECT table on a table with more than 800,000 records with LIMIT 1, but it always return me the same record. I'm using the Shark distribution,…

sql hive hiveql shark-sql

asked May 22 '14 at 08:55

visakh

2,503
8
29
55

votes

2 answers

Comparing Cassandra's CQL vs Spark/Shark queries vs Hive/Hadoop (DSE version)

I would like to hear your thoughts and experiences on the usage of CQL and in-memory query engine Spark/Shark. From what I know, CQL processor is running inside Cassandra JVM on each node. Shark/Spark query processor attached with a Cassandra…

cassandra hive cql apache-spark shark-sql

asked Jun 14 '13 at 17:18

Minh Do

votes

1 answer

UDF not working in Spark SQL

I'm trying to calculate Jaccard index on Spark SQL. My table on Hive has the following data: hive> select * from test_1; 1 ["rock","pop"] 2 ["metal","rock"] Table DDL: create table test_1 (id int, val array); I'm using the UDF from…

scala hive apache-spark shark-sql

asked Jul 31 '14 at 13:00

visakh

2,503
8
29
55

votes

6 answers

Connect to Spark SQL via ODBC

According to this page: https://spark.apache.org/sql/ you can connect existing BI tools to Spark SQL via ODBC or JDBC: I don't mean Shark as this is basically EOL: It is for this reason that we are ending development in Shark as a separate project…

hadoop odbc apache-spark shark-sql

asked Sep 08 '14 at 18:05

Chris Matta

3,263
3
35
48

votes

1 answer

Spark Streaming historical state

I am building real time processing for detecting fraud ATM card transaction. in order to efficiently detect fraud, logic requires to have last transaction date by card, sum of transaction amount by day (or last 24 Hrs.) One of usecase is if card…

java scala apache-spark shark-sql spark-streaming

asked Jun 20 '14 at 16:30

Jigar Parekh

6,163
7
44
64

votes

1 answer

Is it possible to run Shark queries over Spark Streaming data?

Is it possible to run Shark queries over the data contained in the DStreams of a Spark Streaming application? (for istance inside a foreachRDD call) Are there any specific API to do that? Thanks.

apache-spark shark-sql

asked Jun 04 '14 at 23:58

gprivitera

votes

1 answer

shark/spark throws NPE when querying a table

The development part of shark/spark wiki is really brief, so I tried to put together a code in an effort to programmatically query a table. Here it is ... object Test extends App { val master = "spark://localhost.localdomain:8084" val jobName =…

scala nullpointerexception classnotfoundexception apache-spark shark-sql

asked Jan 06 '13 at 22:53

Sheng

1,697
4
19
33

votes

1 answer

Accessing Shark tables (Hive) from Scala (shark-shell)

I have shark-0.8.0 which runs on hive-0.9.0. I am able to program on Hive by invoking shark. I created a few tables and loaded them with data. Now, I am trying to access the data from these tables using Scala. I invoked the Scala shell using…

scala hive apache-spark shark-sql

asked May 09 '14 at 13:27

visakh

2,503
8
29
55

votes

2 answers

Datastax DSE Cassandra, Spark, Shark, Standalone Programm

I use Datastax Enterprise 4.5. I hope I did the config right, I did it like on datastax website explained. I can write into the Cassandra DB with an Windowsservice, this works but i can't query with Spark using the where function. I start the…

java scala cassandra apache-spark shark-sql

asked Sep 01 '14 at 15:47

richie676

votes

1 answer

Improving write performance in Hive

I am performing various calculations (using UDFs) on Hive. The computations are pretty fast enough, but I am hitting a roadblock with the write performance in Hive. My result set is close to ten million records, and it takes a few minutes to write…

hive apache-spark hiveql shark-sql

asked Jul 25 '14 at 11:37

visakh

2,503
8
29
55

votes

1 answer

Loading multiple JSON records from one file to HIVE

I am trying to load JSON files into Hive using JSON Serde. I am able to get it working for one JSON file at a time, but I was wondering whether it's possible to have more than one record in a JSON file at a time and get them loaded in one shot. To…

json hive shark-sql

asked May 02 '14 at 13:31

visakh

2,503
8
29
55

votes

2 answers

How many Shark servers are necessary in relation to Spark?

I'm new to Spark/Shark and have spun up a cluster with three Spark workers. I started installing Shark on the same three servers but I'm coming to the conclusion that maybe that's not needed and only one Shark server is necessary -- I can't find…

apache-spark shark-sql

asked Apr 17 '14 at 17:17

Bill

votes

1 answer

Integrating cassandra and shark

I am trying to get shark working on Cassandra, so i pull the data from Cassandra into shark and run queries. I used CASH open source storage handler, it seems to work when i run shark locally but when in distributes mode looks like spark slaves…

cassandra hive apache-spark shark-sql

asked Mar 01 '14 at 07:18

user3367572

votes

1 answer

Has anyone been successful running Apache Spark & Shark on Cassandra

I am trying to configure a 5 node cassandra cluster to run Spark/Shark to test out some Hive queries. I have installed Spark, Scala, Shark and configured according to Amplab [Running Shark on a cluster] …

scala cassandra hive apache-spark shark-sql

asked Nov 15 '13 at 10:58

kwasbob

2 3 4 Next