Categories
spark

spark client and cluster mode

In this post , let us learn about the spark client and cluster mode .

Where should we use ?

spark submit options will require master and deployment mode related information .

For YARN cluster manager , master will be YARN and deploy mode will vary depends on the requirement .

spark-submit --master <master-url> --deploy-mode <deploy-mode> -- <other options >

What is Driver program ?

Before getting into the topic , let us get to know about the driver program . When a user submit any spark job , the very first step will be the driver program triggers . This plays a prominent role in controlling the  executors in worker nodes .

What are the main types of deployment modes in YARN ?

Based on where actually the driver program triggers , we can differentiate the types of modes . The YARN client mode is the default mode . 

  • YARN client mode
  • YARN cluster mode

Client mode

In this client mode , the driver program launches on the same edge node where the spark job is submitted . This driver program utilizes the resources like memory and CPU from the same machine where we submit the job .

If multiple number of jobs try to use the client mode , we might encounter out of memory issue because of over utilization of resources . And so it is better to use the client mode only for development and debugging purposes . It supports Spark shell as well.

spark-submit --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 1 --num-executors 5 

Cluster mode

The driver program launches on any other worker node except the edge node on which spark job submitted. One disadvantage is that it does not support Spark shell .

We can run multiple jobs using the cluster mode simultaneously without any trouble . For this reason , the cluster mode is used whenever we are deploying in production .

spark-submit --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --num-executors 5 

Any common features between client and cluster mode ?

Yes , We do have few common characteristics among both client and cluster mode  . 

1. Application Master – request the resources
2. Executors are initiated by – YARN Nodemanagers
3. Other persistent services by – YARN Nodemanagers and YARN Resourcemanager

Reference

https://spark.apache.org/docs/latest/running-on-yarn.html