Questions tagged [apache-drill]

Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data.It is capable of querying nested data in formats like JSON and Parquet and performing dynamic schema discovery.

Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity and ecosystem of ANSI SQL, the industry-standard query language. Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores.

Recommended reference sources:

644 questions
43
votes
2 answers

Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)

I want to do some "near real-time" data analysis (OLAP-like) on the data in a HDFS. My research showed that the three mentioned frameworks report significant performance gains compared to Apache Hive. Does anyone have some practical experience with…
user2306380
  • 611
  • 1
  • 7
  • 10
18
votes
3 answers

Convert file of JSON objects to Parquet file

Motivation: I want to load the data into Apache Drill. I understand that Drill can handle JSON input, but I want to see how it performs on Parquet data. Is there any way to do this without first loading the data into Hive, etc and then using one of…
danieltahara
  • 4,743
  • 3
  • 18
  • 20
17
votes
8 answers

One SQL query to access multiple data sources in Java (from oracle, excel, sql server)

I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server, and so on) using one SQL query. For example: SELECT o.employeeId, count(o.orderId) FROM employees@excel e.…
Slava Vedenin
  • 58,326
  • 13
  • 40
  • 59
16
votes
2 answers

Apache Drill has bad performance against SQL Server

I tried using apache-drill to run a simple join-aggregate query and the speed wasn't really good. my test query was: SELECT p.Product_Category, SUM(f.sales) FROM facts f JOIN Product p on f.pkey = p.pkey GROUP BY p.Product_Category Where facts has…
Imbar M.
  • 1,074
  • 1
  • 10
  • 19
16
votes
3 answers

Apache Drill vs Spark

I have some expirience with Apache Spark and Spark-SQL. Recently I've found Apache Drill project. Could you describe me what are the most significant advantages/differences between them? I've already read Fast Hadoop Analytics (Cloudera Impala vs…
Matzz
  • 670
  • 1
  • 7
  • 17
15
votes
1 answer

How to implement optimization of INNER JOINS (push down) for Mongo Storage Plugin in Apache Drill?

I would like to extend the Apache Drill Mongo Storage Plugin to push down INNER JOINs. Therefore I would like to rewrite INNER JOIN into the mongo aggregation pipeline. How do we need to start to implement the rewrite in Apache Drill. Here is a SQL…
Dennis Münkle
  • 5,036
  • 1
  • 19
  • 18
9
votes
1 answer

Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

I've been hunting around for a solution to this question. It appears to me that there is no way to embed reading and writing Parquet format in a Java program without pulling in dependencies on HDFS and Hadoop. Is this correct? I want to read and…
Jesse
  • 346
  • 4
  • 10
8
votes
3 answers

How to Use Apache Drill with Cassandra

I am trying to query Cassandra using Apache Drill. The only connector I could find is here: http://www.confusedcoders.com/bigdata/apache-drill/sql-on-cassandra-querying-cassandra-via-apache-drill However this does not build. It comes up with an…
KingOfHypocrites
  • 9,316
  • 9
  • 47
  • 69
8
votes
6 answers

Write Drill query output to csv (or some other format)

I'm using drill in embedded mode, and I can't figure out how to save query output other than copy and pasting it.
Kevin
  • 3,391
  • 5
  • 30
  • 40
7
votes
4 answers

Apache Drill - connection to Drill in Embedded Mode [java]

I want to connect to Drill by Java app, and so far I was trying to use JDBC to do it and I'm using example from https://github.com/vicenteg/DrillJDBCExample, but... when I change DB_URL static variable to "jdbc:drill:zk=local" and start app i get…
susanoo
  • 289
  • 4
  • 13
6
votes
1 answer

How to escape table names in SqlAlchemy

I'm working on a SQLAlchemy dialect for Apache Drill and I've run into an issue that I can't quite seem to figure out. The basic problem is that SQLAlchemy is generating a query like the one below: SELECT `field1`, `field2` FROM dfs.test.data.csv…
cgivre
  • 513
  • 4
  • 21
6
votes
2 answers

Integrating Spark SQL and Apache Drill through JDBC

I would like to create a Spark SQL DataFrame from the results of a query performed over CSV data (on HDFS) with Apache Drill. I successfully configured Spark SQL to make it connect to Drill via JDBC: Map connectionOptions = new…
Skice
  • 461
  • 5
  • 18
6
votes
2 answers

Performance of Apache Drill

Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. Thanks.
Sai
  • 127
  • 1
  • 2
  • 9
6
votes
1 answer

Java or C++ API for Apache Drill

I want to access Drill through a programming interface. The Apache Drill documentation just mentions about its JAVA and C++ libraries for the client to connect but doesn't provide any documentation or example for the same.…
nash
  • 193
  • 7
6
votes
4 answers

Apache Drill connection through Java

Throughout the Wiki of Apache Drill, I could only see queries running via SqlLine client. Is there any programmatical way to run queries in Drill other than the REST API? Any samples or pointers? Or is it as equivalent as using JDBC driver to run…
Tamil
  • 5,260
  • 9
  • 40
  • 61
1
2 3
42 43