Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions
75
votes
12 answers

Buiding Hadoop with Eclipse / Maven - Missing artifact jdk.tools:jdk.tools:jar:1.6

I am trying to import cloudera's org.apache.hadoop:hadoop-client:2.0.0-cdh4.0.0 from cdh4 maven repo in a maven project in eclipse 3.81, m2e plugin, with oracle's jdk 1.7.0_05 on win7 using org.apache.hadoop
jvataman
  • 1,357
  • 1
  • 12
  • 13
64
votes
3 answers

How to check Spark Version

I want to check the spark version in cdh 5.7.0. I have searched on the internet but not able to understand. Please help.
Ironman
  • 1,330
  • 2
  • 19
  • 40
55
votes
4 answers

Where are logs in Spark on YARN?

I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution. The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce…
DeepNightTwo
  • 4,809
  • 8
  • 46
  • 60
32
votes
5 answers

Find port number where HDFS is listening

I want to access hdfs with fully qualified names such as : hadoop fs -ls hdfs://machine-name:8020/user I could also simply access hdfs with hadoop fs -ls /user However, I am writing test cases that should work on different distributions(HDP,…
ernesto
  • 1,899
  • 4
  • 26
  • 39
25
votes
4 answers

How to get hadoop put to create directories if they don't exist

I have been using Cloudera's hadoop (0.20.2). With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories: So for example, if I had no directories in hdfs and…
owly
  • 251
  • 1
  • 3
  • 4
21
votes
7 answers

JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')

We have the following string which is a valid JSON written to a file on HDFS. { "id":"tag:search.twitter.com,2005:564407444843950080", "objectType":"activity", "actor":{ "objectType":"person", "id":"id:twitter.com:2302910022", …
Fanooos
  • 2,718
  • 5
  • 31
  • 55
21
votes
3 answers

Impala can't access all hive table

I try to query hbase data through hive (I'm using cloudera). I did a fiew hive external table pointing to hbase but the thing is Cloudera's Impala doesn't have an access to all those tables. All hive external tables appear in the metastore manager…
Nosk
  • 753
  • 2
  • 6
  • 24
19
votes
5 answers

issue Running Spark Job on Yarn Cluster

I want to run my spark Job in Hadoop YARN cluster mode, and I am using the following command: spark-submit --master yarn-cluster --driver-memory 1g --executor-memory 1g --executor-cores 1 …
Sachin Singh
  • 739
  • 4
  • 12
  • 29
19
votes
4 answers

How to find cdh version hadoop

When connecting to Hadoop cluster, how can I know which version of Hadoop this cluster is running? In particular this is important for proper configuration of libraries when compiling and packaging Hadoop Java jobs with Maven.
Vladimir Kroz
  • 5,237
  • 6
  • 39
  • 50
19
votes
3 answers

Rstudio-server environment variables not loading?

I'm trying to run rhadoop on Cloudera's hadoop distro (I can't remember if its CDH3 or 4), and am running into an issue: Rstudio server doesn't seem to recognize my global variables. In my /etc/profile.d/r.sh file, I have: export…
AI52487963
  • 1,253
  • 2
  • 17
  • 36
18
votes
2 answers

Got InterruptedException while executing word count mapreduce job

I have installed Cloudera VM version 5.8 on my machine. When I execute word count mapreduce job, it throws below exception. `16/09/06 06:55:49 WARN hdfs.DFSClient: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native…
18
votes
6 answers

Accessing Hue on Cloudera Docker QuickStart

I have installed the cloudera quickstart using docker based on the instructions given here. https://blog.cloudera.com/blog/2015/12/docker-is-the-new-quickstart-option-for-apache-hadoop-and-cloudera/ docker run --privileged=true…
Knows Not Much
  • 30,395
  • 60
  • 197
  • 373
18
votes
1 answer

yarn is not honouring yarn.nodemanager.resource.cpu-vcores

I am using Hadoop-2.4.0 and my system configs are 24 cores, 96 GB RAM. I am using following…
banjara
  • 3,800
  • 3
  • 38
  • 61
16
votes
5 answers

Spark : check your cluster UI to ensure that workers are registered

I have a simple program in Spark: /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val conf = new…
vineet sinha
  • 317
  • 1
  • 4
  • 12
16
votes
4 answers

What is the correct way to start/stop spark streaming jobs in yarn?

I have been experimenting and googling for many hours, with no luck. I have a spark streaming app that runs fine in a local spark cluster. Now I need to deploy it on cloudera 5.4.4. I need to be able to start it, have it run in the background…
Kevin Pauli
  • 8,577
  • 15
  • 49
  • 70
1
2 3
99 100