0

What I'm planning to do is develop a reporting platform using existing data. I have an existing RDBMS which has large number of records. So I'm using. (Hadoop 2.7, Spark, Hive, JasperReports, Scoop - Architecuture)

  • Scoop - Extract data from RDBMS to Hadoop
  • Hadoop - Storage platform
  • Hive - Datawarehouse
  • Spark - Since Hive is more like batch processing Spark on Hive will speed up things
  • JasperReports - To generate reports.

Given that I have already read the following

Which mode should I use? Why? Decision is based on what?

Community
  • 1
  • 1
Techie
  • 44,706
  • 42
  • 157
  • 243

2 Answers2

1

The decision is about whether you want your application to run as a YARN application or not.

A non-YARN application (which you get in yarn-client mode) is simpler. It's a classical Linux application, you can start it like any application and it runs on that machine like any application.

A YARN application (which you get in yarn-cluster mode) is managed by YARN. It runs on whatever machine YARN decides to put it on. If it dies, YARN will restart it, perhaps on a different machine. It is more robust (e.g. it will get restarted if the machine dies) but at the cost of complexity (e.g. you don't have a fixed IP address for the application).

I'd go with yarn-client at first. You can switch to yarn-cluster later if you find you need the features it provides.

Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • Thanks for the information. Considering the nature of my application what's the most suitable – Techie Nov 19 '15 at 10:48
  • 1
    Given your description in the linked question I think you will benefit from `yarn-cluster` mode in the long term. – Daniel Darabos Nov 19 '15 at 15:01
  • One last question, could you please explain why? what's the reason you think like that? – Techie Nov 19 '15 at 17:05
  • 1
    In the linked question you write _"I want to support fail-over"_. So I guess you don't want the application to go down if a single machine fails. So you need YARN, so it will restart the application on another node in that case. – Daniel Darabos Nov 19 '15 at 18:34
1

Adding some more info to Danier Darabos answer : Apart from hosting application/faillover and where Driver runs ( Application Master in yarn-cluster mode or Client in yarn-client mode, other features remains same. But yarn-client mode supports spark-shell unlike yarn-cluster mode.

enter image description here

Have a look at this article to know the difference between running Spark application in various modes - YARN Cluster , YARN Client & Spark Stand alone modes

Take a calculated decision after considering criteria in all options.

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211