Questions tagged [cloudera-cdh]

For questions specifically about Cloudera's Distribution of Apache Hadoop (CDH). Please look at https://community.cloudera.com/ before posting questions.

From cloudera.com - CDH Components:

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything you need for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects, Cloudera has created a functionally advanced system that helps you perform end-to-end Big Data workflows.

Key Projects:

Apache Hadoop (Core)

Apache Accumulo

Apache Flume

Apache HBase

Apache Hive

Hue

Apache Impala (incubating)

Apache Kafka

Apache Pig

Apache Sentry

Cloudera Search

Apache Spark

Apache Sqoop

RESOURCES

CDH5 - archives - CDH5 packages and parcels
Documentation - official documentation
Blogs - engineering blogs with useful tutorials and in-depth explanations of Hadoop functionality
Community Forums - questions and answers from the CDH community

Related Tags

1018 questions

votes

16 answers

How to check the Spark version

as titled, how do I know which version of spark has been installed in the CentOS? The current system has installed cdh5.1.0.

apache-spark cloudera-cdh

asked Apr 17 '15 at 03:52

HappyCoding

5,029
7
31
51

votes

6 answers

Spark : how to run spark file from spark shell

I am using CDH 5.2. I am able to use spark-shell to run the commands. How can I run the file(file.spark) which contain spark commands. Is there any way to run/compile the scala programs in CDH 5.2 without sbt?

scala apache-spark cloudera-cdh cloudera-manager

asked Dec 31 '14 at 06:52

Ramakrishna

1,170
2
10
17

votes

10 answers

Cannot Read a file from HDFS using Spark

I have installed cloudera CDH 5 by using cloudera manager. I can easily do hadoop fs -ls /input/war-and-peace.txt hadoop fs -cat /input/war-and-peace.txt this above command will print the whole txt file on the console. now I start the spark shell…

hadoop apache-spark cloudera-cdh

asked Dec 15 '14 at 05:47

Knows Not Much

30,395
60
197
373

votes

4 answers

Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

I have been working on this problem for two days and still have not find the way. Problem: Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera…

apache-spark cloudera-cdh hadoop-lzo

asked May 03 '14 at 06:37

caesar0301

1,913
2
22
24

votes

1 answer

How to efficiently update Impala tables whose files are modified very frequently

We have a Hadoop-based solution (CDH 5.15) where we are getting new files in HDFS in some directories. On top os those directories we have 4-5 Impala (2.1) tables. The process writing those files in HDFS is Spark Structured Streaming (2.3.1) Right…

hadoop impala spark-structured-streaming cloudera-cdh

asked Feb 06 '20 at 08:24

Victor

2,450
2
23
54

votes

4 answers

PySpark distributed processing on a YARN cluster

I have Spark running on a Cloudera CDH5.3 cluster, using YARN as the resource manager. I am developing Spark apps in Python (PySpark). I can submit jobs and they run succesfully, however they never seem to run on more than one machine (the local…

apache-spark hadoop-yarn cloudera-cdh pyspark

asked Jan 30 '15 at 05:06

aaa90210

11,295
13
51
88

votes

14 answers

Incorrect configuration: namenode address dfs.namenode.rpc-address is not configured

I am getting this error when I try and boot up a DataNode. From what I have read, the RPC paramters are only used for a HA configuration, which I am not setting up (I think). 2014-05-18 18:05:00,589 INFO [main] impl.MetricsSystemImpl…

hadoop hdfs cloudera-cdh

asked May 18 '14 at 08:19

aaa90210

11,295
13
51
88

votes

1 answer

Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?

We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we…

hadoop informatica cloudera-cdh informatica-powercenter impala

asked Dec 23 '15 at 21:11

sun_dare

1,146
2
13
33

votes

1 answer

Error in Hive Query while joining tables

I am unable to pass the equality check using the below HIVE query. I have 3 table and i want to join these table. I trying as below, but get error : FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN…

join hadoop hive hiveql cloudera-cdh

asked Sep 13 '14 at 08:07

Agustus

votes

2 answers

Datastax Cassandra Driver throwing CodecNotFoundException

The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5…

cassandra datastax-enterprise cloudera-cdh datastax-java-driver spark-cassandra-connector

asked Jun 02 '16 at 10:05

Syed Ammar Mustafa

votes

1 answer

Not able to install hadoop using Cloudera Manager

I am trying to setup hadoop cluster in a single VM (for simplicity) using cloudera Manager 5.9. The below are the details of my environment: Host OS -> Windows 10 Virtualization software -> Virtual box 5.1.10 Guest OS -> Cent OS 6.8 I installed the…

postgresql hadoop hadoop2 cloudera-cdh cloudera-manager

asked Dec 17 '16 at 17:23

CuriousMind

8,301
22
65
134

votes

1 answer

Running impala cluster from portable binaries

I'm evaluating multiple big data tools. One of them is of course Impala. I would like to start Impala cluster by manually starting processes on the cluster nodes. As I'm currently doing for Spark, H2O, Presto and Dask, I would like to grab binaries,…

cloudera-cdh impala bigdata

asked Aug 22 '16 at 20:03

jangorecki

16,384
4
79
160

votes

1 answer

Could not find uri with key dfs.encryption.key.provider.uri to create a keyProvider in HDFS encryption for CDH 5.4

CDH Version: CDH5.4.5 Issue: When HDFS Encryption is enabled using KMS available in Hadoop CDH 5.4 , getting error while putting file into encryption zone. Steps: Steps for Encryption of Hadoop as follows: Creating a key [SUCCESS] [tester@master…

hadoop encryption copy hdfs cloudera-cdh

asked Sep 09 '15 at 10:07

Jack Sparrow

votes

1 answer

YARN UNHEALTHY nodes

In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error 2015-02-21 08:33:51,590 INFO…

hadoop distributed-computing cloudera hadoop-yarn cloudera-cdh

asked Mar 12 '15 at 12:41

roy

6,344
24
92
174

votes

1 answer

How to kill a mapred job started by hive?

I'm working by CDH 5.1 now. It starts normal Hadoop job by YARN but hive still works with mapred. Sometimes a big query will hang for a long time and I want to kill it. I can find this big job by JobTracker web console while it didn't provide a…

hadoop mapreduce hive hadoop-yarn cloudera-cdh

asked Feb 12 '15 at 06:28

2shou

2 3

…

67 68 Next