In this post , let us learn about the spark client and cluster mode .
Where should we use ?
spark submit options will require master and deployment mode related information .
For YARN cluster manager , master will be YARN and deploy mode will vary depends on the requirement .
spark-submit --master <master-url> --deploy-mode <deploy-mode> -- <other options >
What is Driver program ?
Before getting into the topic , let us get to know about the driver program . When a user submit any spark job , the very first step will be the driver program triggers . This plays a prominent role in controlling the executors in worker nodes .
What are the main types of deployment modes in YARN ?
Based on where actually the driver program triggers , we can differentiate the types of modes . The YARN client mode is the default mode .
- YARN client mode
- YARN cluster mode
Client mode
In this client mode , the driver program launches on the same edge node where the spark job is submitted . This driver program utilizes the resources like memory and CPU from the same machine where we submit the job .
If multiple number of jobs try to use the client mode , we might encounter out of memory issue because of over utilization of resources . And so it is better to use the client mode only for development and debugging purposes . It supports Spark shell as well.
spark-submit --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 1 --num-executors 5
Cluster mode
The driver program launches on any other worker node except the edge node on which spark job submitted. One disadvantage is that it does not support Spark shell .
We can run multiple jobs using the cluster mode simultaneously without any trouble . For this reason , the cluster mode is used whenever we are deploying in production .
spark-submit --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --num-executors 5
Any common features between client and cluster mode ?
Yes , We do have few common characteristics among both client and cluster mode .
1. Application Master – request the resources
2. Executors are initiated by – YARN Nodemanagers
3. Other persistent services by – YARN Nodemanagers and YARN Resourcemanager