Questions tagged [cloudera-cdh]

For questions specifically about Cloudera's Distribution of Apache Hadoop (CDH). Please look at https://community.cloudera.com/ before posting questions.

From cloudera.com - CDH Components:

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything you need for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects, Cloudera has created a functionally advanced system that helps you perform end-to-end Big Data workflows.

Key Projects:

  • Apache Hadoop (Core)
  • Apache Accumulo
  • Apache Flume
  • Apache HBase
  • Apache Hive
  • Hue
  • Apache Impala (incubating)
  • Apache Kafka
  • Apache Pig
  • Apache Sentry
  • Cloudera Search
  • Apache Spark
  • Apache Sqoop

RESOURCES

  • CDH5 - archives - CDH5 packages and parcels
  • Documentation - official documentation
  • Blogs - engineering blogs with useful tutorials and in-depth explanations of Hadoop functionality
  • Community Forums - questions and answers from the CDH community

Related Tags

1018 questions
84
votes
16 answers

How to check the Spark version

as titled, how do I know which version of spark has been installed in the CentOS? The current system has installed cdh5.1.0.
HappyCoding
  • 5,029
  • 7
  • 31
  • 51
72
votes
6 answers

Spark : how to run spark file from spark shell

I am using CDH 5.2. I am able to use spark-shell to run the commands. How can I run the file(file.spark) which contain spark commands. Is there any way to run/compile the scala programs in CDH 5.2 without sbt?
Ramakrishna
  • 1,170
  • 2
  • 10
  • 17
37
votes
10 answers

Cannot Read a file from HDFS using Spark

I have installed cloudera CDH 5 by using cloudera manager. I can easily do hadoop fs -ls /input/war-and-peace.txt hadoop fs -cat /input/war-and-peace.txt this above command will print the whole txt file on the console. now I start the spark shell…
Knows Not Much
  • 30,395
  • 60
  • 197
  • 373
14
votes
4 answers

Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

I have been working on this problem for two days and still have not find the way. Problem: Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera…
caesar0301
  • 1,913
  • 2
  • 22
  • 24
12
votes
1 answer

How to efficiently update Impala tables whose files are modified very frequently

We have a Hadoop-based solution (CDH 5.15) where we are getting new files in HDFS in some directories. On top os those directories we have 4-5 Impala (2.1) tables. The process writing those files in HDFS is Spark Structured Streaming (2.3.1) Right…
Victor
  • 2,450
  • 2
  • 23
  • 54
12
votes
4 answers

PySpark distributed processing on a YARN cluster

I have Spark running on a Cloudera CDH5.3 cluster, using YARN as the resource manager. I am developing Spark apps in Python (PySpark). I can submit jobs and they run succesfully, however they never seem to run on more than one machine (the local…
aaa90210
  • 11,295
  • 13
  • 51
  • 88
12
votes
14 answers

Incorrect configuration: namenode address dfs.namenode.rpc-address is not configured

I am getting this error when I try and boot up a DataNode. From what I have read, the RPC paramters are only used for a HA configuration, which I am not setting up (I think). 2014-05-18 18:05:00,589 INFO [main] impl.MetricsSystemImpl…
aaa90210
  • 11,295
  • 13
  • 51
  • 88
11
votes
1 answer

Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?

We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we…
sun_dare
  • 1,146
  • 2
  • 13
  • 33
11
votes
1 answer

Error in Hive Query while joining tables

I am unable to pass the equality check using the below HIVE query. I have 3 table and i want to join these table. I trying as below, but get error : FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN…
Agustus
  • 634
  • 1
  • 7
  • 24
10
votes
2 answers

Datastax Cassandra Driver throwing CodecNotFoundException

The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5…
8
votes
1 answer

Not able to install hadoop using Cloudera Manager

I am trying to setup hadoop cluster in a single VM (for simplicity) using cloudera Manager 5.9. The below are the details of my environment: Host OS -> Windows 10 Virtualization software -> Virtual box 5.1.10 Guest OS -> Cent OS 6.8 I installed the…
CuriousMind
  • 8,301
  • 22
  • 65
  • 134
8
votes
1 answer

Running impala cluster from portable binaries

I'm evaluating multiple big data tools. One of them is of course Impala. I would like to start Impala cluster by manually starting processes on the cluster nodes. As I'm currently doing for Spark, H2O, Presto and Dask, I would like to grab binaries,…
jangorecki
  • 16,384
  • 4
  • 79
  • 160
8
votes
1 answer

Could not find uri with key dfs.encryption.key.provider.uri to create a keyProvider in HDFS encryption for CDH 5.4

CDH Version: CDH5.4.5 Issue: When HDFS Encryption is enabled using KMS available in Hadoop CDH 5.4 , getting error while putting file into encryption zone. Steps: Steps for Encryption of Hadoop as follows: Creating a key [SUCCESS] [tester@master…
Jack Sparrow
  • 81
  • 1
  • 1
  • 4
8
votes
1 answer

YARN UNHEALTHY nodes

In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error 2015-02-21 08:33:51,590 INFO…
roy
  • 6,344
  • 24
  • 92
  • 174
8
votes
1 answer

How to kill a mapred job started by hive?

I'm working by CDH 5.1 now. It starts normal Hadoop job by YARN but hive still works with mapred. Sometimes a big query will hang for a long time and I want to kill it. I can find this big job by JobTracker web console while it didn't provide a…
2shou
  • 145
  • 1
  • 1
  • 7
1
2 3
67 68