3

When Spark is deployed in YARN cluster mode, how should I issue the Spark monitoring REST API calls http://spark.apache.org/docs/latest/monitoring.html ?

Does YARN have an API that takes the REST call for example (I already know the app-id)

http://localhost:4040/api/v1/applications/[app-id]/jobs

, proxies it to the correct driver port, and returns the JSON back to me? By "me" I mean my client.

Assume (or already by design) I cannot directly talk to the driver machine due to security reasons.

jasonw
  • 31
  • 1
  • 2
  • did you get to the bottom of this? I'm currently having the same problem - how to consume the history server API ([host]:18089:/api/v1/applications/[app-id/jobs) to get jobs information, when the Spark app is submitted and managed through Yarn? When using Yarn, the History Server API provides data related to the Spark App, but not the jobs until the App finishes. However, in Spark standalone mode, the History Server API does provide near-live data related to jobs when the app is running. – steswinbank Nov 09 '17 at 10:10

2 Answers2

0

pls have a look at spark docs - REST API

Yes with the latest api its available.

By this article It turns out there is a third surprisingly easy option which is not documented. Spark has a hidden REST API which handles application submission, status checking and cancellation.

In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at /api/v1. Eg., for the history server, they would typically be accessible at http://:18080/api/v1, and for a running application, at http://localhost:4040/api/v1.

These are the other options available..

  • Livy jobserver

Submit Spark jobs remotely to an Apache Spark cluster Linux using Livy

Other options include

Triggering spark jobs with REST

Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
0

This is what worked for me,

In yarn resource manager UI, click on link of the "application manager" for the running application and note the URL that it directs to

For me the link was something like http://RM:20888/proxy/application_1547506848892_0002/

Append "api/v1/applications/application_1547506848892_0002" to the URL for the api.

For above case the api url is curl "http://RM:20888/proxy/application_1547506848892_0002/api/v1/applications/application_1547506848892_0002"

user2677485
  • 107
  • 7