4

Apache Ambari moved into the Attic in January 2022.

So Apache Ambari has retired, and the only reliable alternative that I know is Cloudera Manager, but Cloudera Manager is a paid service and because of that is not very helpful for small and medium companies.

What tools can now help us install and manage in a proper way the Hadoop ecosystem to go live in production? We'd prefer not to end up with a Hadoop ecosystem installed manually and hardly manageable.

Are there some good alternatives to Apache Ambari?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Cosmin
  • 676
  • 7
  • 15

3 Answers3

5

Apache Ambari leaves the attic status and return to the top level apache project

screenshot from mailing list

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
david zollo
  • 66
  • 1
  • 4
2

The question falls into several categories...

Configuration

For self-hosted solutions, the answer is configuration management + automation tools.

Ansible (+ AWX/Tower), Puppet (+ Foreman, or similar), Chef, etc for file-management (config-as-code, GitOps style). This is far better than Ambari because the config file templates (and history) are actually backed up in VCS rather than spread across several HTML input boxes like in Ambari (or Cloudera Manager).

Use virtualization like VMWare or otherwise acquire physical machines in your data-center.

Otherwise, all cloud-providers have their own dashboards for cluster management and provisioning and elastic scaling. For a "small / medium company", you should focus on your business problem, not infrastructure maintenance, so use the cloud. I've personally used the EMR terraform module, and it was fairly straightforward for a basic cluster (non production, didn't need to maintain it very long).
None of the cloud Hadoop offerings use Ambari.

You previously had , and for that, you could use just use Databricks and not need a whole Hadoop cluster.

Monitoring Widgets

Grafana.

Prometheus JMX Exporter can be added to all the Hadoop JVM processes. Node and Blackbox exporter can also be added for CPU/Mem host usage and TCP/HTTP healthchecks.

Recent HDP releases had already started using Grafana for displaying metrics and a lot of them duplicated the Ambari widgets.

Ambari Alerts

Prometheus AlertManager, for example, but you may want something more robust like NewRelic, DataDog, etc.

UI query functions (Ambari Views)

HUE is probably the closest thing to File Browser, Hive Editor, etc.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • This is correct but way painful. How do folks run spark these days? The cloud is *not* the answer to _Life the universe and everything_ – WestCoastProjects Aug 24 '22 at 12:17
  • This answer has nothing really to do with Spark, but the Spark developers are very active on maintaining their k8s execution engine. Or you can still use YARN without Ambari – OneCricketeer Aug 24 '22 at 13:25
  • By `k8s engine` is that a newfangled angle/version of the tried and true `spark standalone` ? From the perspective of performance and capacity Spark standalone stood up better than yarn and nearly as well as mesos: it's not a toy. When I did the heavy duty testing 7 years ago it simply did not have the authentication and security items that enterprises need. Not sure if that changed or not (e.g. kerberos support ) – WestCoastProjects Aug 24 '22 at 18:24
  • @WestCoastProjects I don't know if I would consider Kubernetes "new fangled", but Mesos is deprecated and IDK if Spark-Standalone is really considered production-grade or not. Details - https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types – OneCricketeer Aug 24 '22 at 18:26
  • `Kubernetes` is new-fangled when compared to my personal hadoop/spark timeline of 2011-2015. I was doing manual hadoop configurations/installations/deployments of vanilla apache hadoop before cdh were available [or HDP even existed] . For Spark I did a heavy-duty large-scale clustering performance engagement in 2015 and `kubernetes` was just coming around in that [large] organization and did not make the timeline to get compared. As a curiosity i attended the very first `Ambari` meetup somewhere in that timeframe. – WestCoastProjects Aug 25 '22 at 02:17
  • @west Kubernetes 1.0 was publicly released in 2015... The Spark executor engine for it didn't exist at that time, so Mesos or YARN is what many companies were using. Still, don't see how Spark deployment patterns has any relevance to my answer apart from me mentioning Databricks. Now Apache Mesos itself is in Apache Attic, and no one I've ever seen seriously considers running Docker containers on YARN, so it's either Kubernetes or some cloud offerings that emulate it – OneCricketeer Sep 04 '22 at 17:46
  • Let me rephrase the comment/question about `k8s engine`. Your commentary can be read as _`k8s engine` is the replacement for `spark standalone`_ . I am asking to clarify if that were the intended meaning of your comment. I am presuming not - and that instead `spark standalone` and `k8s` are both having ongoing support – WestCoastProjects Sep 04 '22 at 18:30
  • @west I am not a Spark contributor, but my understanding is that all the engines listed from the link in my previous comment will continue to be supported until documentation says otherwise. But again, the original post was about Ambari, not Spark – OneCricketeer Sep 04 '22 at 20:55
  • 1
    Yea - my thread with you here was because your comment at the top made me think you know something that I do not about spark standalone. We can move on. thx – WestCoastProjects Sep 04 '22 at 21:09
0

You should check out:

https://bigtop.apache.org/

Apache Bigtop - Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark.

steven-matison
  • 1,554
  • 1
  • 9
  • 12
  • This is an alternative to paying Cloudera or using a cloud-vendor Hadoop. It's not a replacement for Ambari as the question was asking for – OneCricketeer Apr 26 '22 at 14:26
  • Wow, thanks sir, the question literally asks for alternatives... which you clearly identify this is an alternative... why the -1? – steven-matison Apr 27 '22 at 15:06
  • As stated, it is not an **Ambari** alternative. – OneCricketeer Apr 27 '22 at 17:28
  • In fact, Bigtop still allows you to [install Ambari](https://github.com/apache/bigtop/tree/master/bigtop-deploy/puppet/modules/ambari), despite it being a deprecated Apache project. – OneCricketeer Apr 27 '22 at 18:13