In this article I'm going to show you how to get hadoop apps information from Resource Manager using rest api. I'll use hadoop 2.7.1 rest apis. If you are using any other version even then you don't have to worry much because the response pattern from the rest api changes hardly.
Even though you could see the rest apis information to fetch apps info in its official documentation, I think it would be better if I explain in more detail to clear some doubts.
To use all the resource manager rest apis we have to know the resource manager webapp address. Generally
for https protocol it is https://resourceManagerHostOrIP:8090
and for http protocol it is
http://resourceManagerHostOrIP:8088/
.
If your cluster has multiple resource managers (High availability) then you have to hit active resource manager.
To fetch a single application information
Rest Api Format :http://<rm http address:port>/ws/v1/cluster/apps/{appId}
This is the rest api format to fetch single application information according to its official documentation. This
is a GET
request.
Here rm http address:port
is the resource manager webapp address.
To fetch multiple applications information
Rest Api Format :http://<rm http address:port>/ws/v1/cluster/apps
This rest api will fetch all the applications available in the resource manager. If there are a lot of applications (in thousands) then your browser might become unresponsive when it will render data. You could also face problems while searching applications. For countering this issue you could apply multiple filters provided by hadoop. Although they have some limitations. I'll explain later.
• states | applications matching the given application states, specified as a comma-separated list. |
• finalStatus | the final status of the application - reported by the application itself |
• user | user name |
• queue | queue name |
• limit | total number of app objects to be returned |
• applicationTypes | applications matching the given application types, specified as a comma-separated list. |
• applicationTags | applications matching any of the given application tags, specified as a comma-separated list. |
• startedTimeBegin | applications with start time beginning with this time, specified in ms since epoch |
• startedTimeEnd | applications with start time ending with this time, specified in ms since epoch |
• finishedTimeBegin | applications with finish time beginning with this time, specified in ms since epoch |
• finishedTimeEnd | applications with finish time ending with this time, specified in ms since epoc |
So these are the filters provided by hadoop. Using them could be tricky. Let me explain them.
states
Using this filter you could fetch applications according to their application state. You can pass the following application states in the query parameter -
- ACCEPTED
- SUBMITTED
- RUNNING
- NEW
- NEW_SAVING
- FAILED
- FINISHED
- KILLED
You could pass multiple values in states. If you want to fetch all the running applications ( apps that are running or are waiting to be executed by resource manager), then the url would be
http://<rm http address:port>/ws/v1/cluster/apps?states=ACCEPTED,SUBMITTED,RUNNING,NEW,NEW_SAVING
If you want to fetch all finished applications including failed and killed applications the url would be
http://<rm http address:port>/ws/v1/cluster/apps?states=FINISHED,FAILED,KILLED
finalStatus
Unlike states
parameters, in finalStatus
query parameter you could pass only one value.
Below are the finalStatus
values -
- SUCCEEDED
- UNDEFINED
- FAILED
- KILLED
user
By using this parameter, you can fetch applications submitted by a particular user. Eg.
http://<rm http address:port>/ws/v1/cluster/apps?user=alice&states=ACCEPTED,SUBMITTED,RUNNING,NEW,NEW_SAVING
The above url will fetch all running applications submitted by user alice. There is one limit though. You can pass only one user in filter.
queue
By using this parameter hadoop will return all running applications in a particular queue. eg.
http://<rm http address:port>/ws/v1/cluster/apps?queue=root.BU2&user=alice
The above url will fetch all running applications (and not finished applications) submitted by user alice in queue root.BU2. You cannot pass multiple queue names in the parameter.
limit
If you just want to limit the number of applications returned by hadoop then you could use this paramters. As I
mentioned before hadoop may return a large number of applications info which may hang your browser. So to
counter this issue, you could use the limit
parameter
http://<rm http address:port>/ws/v1/cluster/apps?limit=10
applicationTypes
By using the paramter you could fetch applications of a particular type. eg.
http://<rm http address:port>/ws/v1/cluster/apps?applicationTypes=MAPREDUCE,SPARK,TEZ
startedTimeBegin, startedTimeEnd, finishedTimeBegin, finishedTimeEnd
These parameters are used to fetch applications ran during a particular period. The timestamp that we pass in these parameters is in milliseconds since 1 January 1970.
Let say you have to fetch applications ran during a particular period from
and to
where
from and to are the timestamps. So there are four cases.

- Applications started before
from
and are still running - Applications started during the period ie. (between from and to) and are still running
- Applications started before the period and finished in the period
- Applications started before the period and finished after the period
- Applications started in the period and finished after the period
To fetch these applications, we have to hit 2 urls
http://<rm http address:port>/ws/v1/cluster/apps?startedTimeEnd=to&states=ACCEPTED,SUBMITTED,RUNNING,NEW,NEW_SAVING
http://<rm http address:port>/ws/v1/cluster/apps?startedTimeEnd=to&finishedTimeBegin=from
If you have to fetch only finished applications and not running applications then you can skip 1st url
NOTE : These rest apis have limitations. The resource manager only stores 10000 applications at a time. However you can increase the limit. But you cannot store applications infinitely. As the resource manager reaches its limit, it removes old applications and you won't find those apps in these rest apis also.