Questions tagged [apache-kudu]

For questions related to Apache Kudu

From https://kudu.apache.org/docs/

About Kudu

Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Kudu's design sets it apart. Some of Kudu's benefits include:

  • Fast processing of OLAP workloads.
  • Integration with MapReduce, Spark and other Hadoop ecosystem components.
  • Tight integration with Impala, making it a good, mutable alternative to using HDFS with Parquet.
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency.
  • Strong performance for running sequential and random workloads simultaneously.
  • Easy to administer and manage with Cloudera Manager.
  • High availability. Tablet Servers and Master use the Raft consensus algorithm, which ensures availability even if f replicas fail, given 2f+1 available replicas. Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure.
  • Structured data model.

By combining all of these properties, Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies. A few examples of applications for which Kudu is a great solution are:

  • Reporting applications where newly-arrived data needs to be immediately available for end users
  • Time-series applications that must simultaneously support:
    • queries across large amounts of historic data
    • granular queries about an individual entity that must return very quickly
  • Applications that use predictive models to make real-time decisions with periodic refreshes of the predictive model based on all historic data
134 questions
4
votes
3 answers

Kudu auto generated key column

I am trying to make custom auto generated/incremented key in Kudu which will keep increasing its value -from a starting seed which zero by default. It's pretty inefficient to go through all records and increment a counter to get a row count. Does…
Anas Mosaad
  • 131
  • 2
  • 8
3
votes
1 answer

How to install Kudu?

I am familiar with hadoop components like hive, hbase, hdfs etc. But i am very new to Apache Kudu. So far, from my research i understood that kudu is nothing but columnar storage like parquet. Also it's faster as Hbase. But i am still unable to find…
Joseph N
  • 540
  • 8
  • 28
3
votes
0 answers

Using KuduContext in pyspark

I would like to use kudu with pyspark. While I can use it with: sc.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"hdp1:7051").option('kudu.table',"impala::test.z_kudu_tab").load() I cannot find a way to import KuduContext. I'm…
Federico Ponzi
  • 2,682
  • 4
  • 34
  • 60
3
votes
0 answers

Zeppelin\jupyter Notebook for KUDU

We are trying to connect Zeppelin Notebook to KUDU via impala. We didn't find any existing KUDU interpreters in addition we tried to find impala interpreters. Any help would be appreciated Rony
ron
  • 625
  • 2
  • 6
  • 17
3
votes
1 answer

How to index a schema in Apache Kudu

I have to create a table in Apache Kudu. I know that we can query in Apache kudu using Apache Impala but i want to create some indexes in the Apache kudu to make the query processing faster,and my question is does Apache Kudu and Apache Impala…
HJSG
  • 41
  • 1
  • 6
3
votes
0 answers

Creating a Dataframe inside spark structured streaming forEachWriter to insert into a kudu table issue

I've an issue that i tried searching for a solution for and couldn't reach anything and would like any *pointers i can get. So I am trying to integrate Spark structured streaming with Apache Kudu, I am reading the stream from Kafka and doing some…
3
votes
1 answer

Apache Kudu vs InfluxDB on time series data for fast analytics

How does Apache Kudu compare with InfluxDB for IoT sensor data that requires fast analytics (e.g. robotics)? Kudu has recently released v1.0 I have a few specific questions on how Kudu handles the following: Sharding? Data retention policies…
user1478046
2
votes
0 answers

pyodbc upsert error - SQL contains 0 parameter markers, but 3 parameters were supplied', 'HY000'

I am using pyodbc, impala driver for kudu on cloudera 5.16, Python 3.6.10 to do an upsert into kudu table. Insert works fine but upsert fails. I am getting an error: SQL contains 0 parameter markers, but 3 parameters were supplied', 'HY000' . The…
ebeb
  • 429
  • 3
  • 12
2
votes
1 answer

Kudu: partitioning strategy for performance related to number of disks

The documentation says: It is recommended that new tables which are expected to have heavy read and write workloads have at least as many tablets as tablet servers. If I have as many tablets as data disks (for instance 3 tablet servers, 10 disks…
Guillaume
  • 2,325
  • 2
  • 22
  • 40
2
votes
1 answer

Kudu drivers for scala 2.12

Are there any Apache Kudu drivers for Scala 2.12? And if not, are they planned? And if not, is this a warning sign that Apache Kudu is not going to be developed any more? I am able to work with Kudu from Spark 2.4 and Scala 2.11, but I would prefer…
radumanolescu
  • 4,059
  • 2
  • 31
  • 44
2
votes
2 answers

What does "avoid multiple Kudu clients per cluster" mean?

I am looking at kudu's documentation. Below is a partial description of kudu-spark. https://kudu.apache.org/docs/developing.html#_avoid_multiple_kudu_clients_per_cluster Avoid multiple Kudu clients per cluster. One common Kudu-Spark coding error is…
xuejianbest
  • 323
  • 1
  • 9
2
votes
2 answers

Turn non-Kudu to Kudu table in Impala

having problem with impala update statement, when I used code below update john_estares_db.tempdbhue set QU=concat(account_id,"Q",quarter(mrs_change_date)," ",year(mrs_change_date)); it return error message: AnalysisException: Impala does not…
jbest
  • 640
  • 1
  • 10
  • 28
2
votes
1 answer

Insert into table KUDU by datastage

I am writing to enquire about a problem in my process: I have a Kudu table and when I try to insert by datastage (11.5 or 11.7) a new row where the size is bigger than 500 characters using the Impala JDBC Driver I receive this error: Fatal Error:…
2
votes
1 answer

NIFI - How to connect to Kerberos enabled KUDU

How can I connect from NIFI to a Kerberos enabled Kudu? I only see one processor to access Kudu - PutKUDU and it doesn't support Kerberos. I haven't seen anywhere online any discussion regarding connecting to Kudu with Kerberos. Am I missing…
Greg Oks
  • 2,700
  • 4
  • 35
  • 41
2
votes
2 answers

How I measure the size of kudu,s table?

I am starting to work with kudu and the only way to measure the size of a table in kudu is throw the Cloudera Manager - KUDU - Chart Library - Total Tablet Size On Disk Across Kudu Replicas. There are another way to know it throw command line?
Skiel
  • 307
  • 1
  • 12
1
2 3
8 9