3

I am familiar with hadoop components like hive, hbase, hdfs etc. But i am very new to Apache Kudu.

So far, from my research i understood that kudu is nothing but columnar storage like parquet. Also it's faster as Hbase.

But i am still unable to find any efficient document for kudu installation. Even i am wondering whether i really need to install any seperate package for kudu or its inbuild in hadoop(EMR or Dataproc).

Please help how can i start hands-on on kudu.

user4157124
  • 2,809
  • 13
  • 27
  • 42
Joseph N
  • 540
  • 8
  • 28

1 Answers1

3

Kudu is NOT a file format but rather a different storage engine. Consider it to be a parallel/alternative file system to your HDFS (or S3). Yes, it DOES require installation of Master and Tablet servers, see Architecture Overview on Apache web site.

And since its an open source Apache project, installation instructions can also be found on Apache web site: https://kudu.apache.org/docs/installation.html.

mazaneicha
  • 8,794
  • 4
  • 33
  • 52
  • 2
    @thebluephantom In terms of being too coarse-grained, all-or-nothing? Yeah, tru dat... Hopefully it'll get better with Ranger integration https://docs.cloudera.com/runtime/7.1.1/administering-kudu/topics/kudu-enabling-ranger-authorization.html – mazaneicha Jun 04 '20 at 20:42
  • 1
    Thanks @mazaneicha for your answer. Can i install kudu on newly created EC2 machine where hadoop doesn't exist? or i need to install kudu on top of hadoop ? – Joseph N Jun 05 '20 at 03:15
  • 2
    By itself, Kudu doesn't depend on any part of Hadoop ecosystem. Although Spark, Impala and Hive Metastore can be tools to process data stored in Kudu. – mazaneicha Jun 05 '20 at 04:01