Contact us

Contact us for more information about Jumbune, professional services and product support.

Send Message

In this article I'm going to show you how to get hadoop apps information from Resource Manager using rest api. I'll use hadoop 2.7.1 rest apis. If you are using any other version even then you don't have to worry much because the response pattern from the rest api changes hardly.

Even though you could see the rest apis information to fetch apps info in its official documentation, I think it would be better if I explain in more detail to clear some doubts.

To use all the resource manager rest apis we have to know the resource manager webapp address. Generally for https protocol it is https://resourceManagerHostOrIP:8090 and for http protocol it is http://resourceManagerHostOrIP:8088/.

If your cluster has multiple resource managers (High availability) then you have to hit active resource manager.

To fetch a single application information

Rest  Api Format :
http://<rm http address:port>/ws/v1/cluster/apps/{appId}

This is the rest api format to fetch single application information according to its official documentation. This is a GET request. Here rm http address:port is the resource manager webapp address.

 

To fetch multiple applications information

Rest  Api Format :
http://<rm http address:port>/ws/v1/cluster/apps

This rest api will fetch all the applications available in the resource manager. If there are a lot of applications (in thousands) then your browser might become unresponsive when it will render data. You could also face problems while searching applications. For countering this issue you could apply multiple filters provided by hadoop. Although they have some limitations. I'll explain later.

• states applications matching the given application states, specified as a comma-separated list.
• finalStatus the final status of the application - reported by the application itself
• user user name
• queue queue name
• limit total number of app objects to be returned
• applicationTypes applications matching the given application types, specified as a comma-separated list.
• applicationTags applications matching any of the given application tags, specified as a comma-separated list.
• startedTimeBegin applications with start time beginning with this time, specified in ms since epoch
• startedTimeEnd applications with start time ending with this time, specified in ms since epoch
• finishedTimeBegin applications with finish time beginning with this time, specified in ms since epoch
• finishedTimeEnd applications with finish time ending with this time, specified in ms since epoc

So these are the filters provided by hadoop. Using them could be tricky. Let me explain them.

states

Using this filter you could fetch applications according to their application state. You can pass the following application states in the query parameter -

  1. ACCEPTED
  2. SUBMITTED
  3. RUNNING
  4. NEW
  5. NEW_SAVING
  6. FAILED
  7. FINISHED
  8. KILLED

You could pass multiple values in states. If you want to fetch all the running applications ( apps that are running or are waiting to be executed by resource manager), then the url would be

http://<rm http address:port>/ws/v1/cluster/apps?states=ACCEPTED,SUBMITTED,RUNNING,NEW,NEW_SAVING

If you want to fetch all finished applications including failed and killed applications the url would be

http://<rm http address:port>/ws/v1/cluster/apps?states=FINISHED,FAILED,KILLED

finalStatus

Unlike states parameters, in finalStatus query parameter you could pass only one value. Below are the finalStatus values -

  1. SUCCEEDED
  2. UNDEFINED
  3. FAILED
  4. KILLED

user

By using this parameter, you can fetch applications submitted by a particular user. Eg.

http://<rm http address:port>/ws/v1/cluster/apps?user=alice&states=ACCEPTED,SUBMITTED,RUNNING,NEW,NEW_SAVING

The above url will fetch all running applications submitted by user alice. There is one limit though. You can pass only one user in filter.

queue

By using this parameter hadoop will return all running applications in a particular queue. eg.

http://<rm http address:port>/ws/v1/cluster/apps?queue=root.BU2&user=alice

The above url will fetch all running applications (and not finished applications) submitted by user alice in queue root.BU2. You cannot pass multiple queue names in the parameter.

limit

If you just want to limit the number of applications returned by hadoop then you could use this paramters. As I mentioned before hadoop may return a large number of applications info which may hang your browser. So to counter this issue, you could use the limit parameter

http://<rm http address:port>/ws/v1/cluster/apps?limit=10

applicationTypes

By using the paramter you could fetch applications of a particular type. eg.

http://<rm http address:port>/ws/v1/cluster/apps?applicationTypes=MAPREDUCE,SPARK,TEZ

startedTimeBegin, startedTimeEnd, finishedTimeBegin, finishedTimeEnd

These parameters are used to fetch applications ran during a particular period. The timestamp that we pass in these parameters is in milliseconds since 1 January 1970.

Let say you have to fetch applications ran during a particular period from and to where from and to are the timestamps. So there are four cases.

hadoop rest apis applications
  1. Applications started before from and are still running
  2. Applications started during the period ie. (between from and to) and are still running
  3. Applications started before the period and finished in the period
  4. Applications started before the period and finished after the period
  5. Applications started in the period and finished after the period

To fetch these applications, we have to hit 2 urls

http://<rm http address:port>/ws/v1/cluster/apps?startedTimeEnd=to&states=ACCEPTED,SUBMITTED,RUNNING,NEW,NEW_SAVING
http://<rm http address:port>/ws/v1/cluster/apps?startedTimeEnd=to&finishedTimeBegin=from

If you have to fetch only finished applications and not running applications then you can skip 1st url

NOTE : These rest apis have limitations. The resource manager only stores 10000 applications at a time. However you can increase the limit. But you cannot store applications infinitely. As the resource manager reaches its limit, it removes old applications and you won't find those apps in these rest apis also.

About Us
Reload

Jumbune is a Machine Learning powered APM which helps to optimize and improve performance of Big Data applications running on Data Clouds and On-Premise clusters. It is enterprise-grade built on open source. It is highly scalable to optimize Big Data applications and clusters.