In this post , let us have a look at spark local and standalone mode .
Local mode
Other than the local and standalone mode which we are going to see in this post , we do have few other deployment mode as well .
Local Mode is the default mode of spark which runs everything on the same machine.
In the case of not mentioning –master flag to the command whether spark-shell or spark-submit , ideally it means it is running in local mode.
Other way is to pass –master option with local as argument which defaults to 1 thread.
We can even increase the number of threads by providing the required number within the square bracket . For instance , spark-shell –master local[2] .
By using asterisks instead like local[*] we can use as many threads as the number of processors available to the Java virtual machine.
spark-submit --class <class name> --master local[8] <jar file>
Standalone mode
- Spark standalone cluster in client deploy mode
- Spark standalone cluster in cluster deploy mode with supervise
- Run a Python application on a Spark standalone cluster
Spark standalone cluster in client deploy mode
Application will submit on the gateway machine which is interlinked with any other worker machines physically . The input and output of the application is attached to the console . And so this mode well suite for the application which include REPL (i.e Spark shell) .In client mode, the driver launches directly within the spark-submit process which acts as a client to the cluster.
spark-submit --class <class name> --master <spark://host id> --executor-memory 20G --total-executor-cores 100 <jar name>
Spark standalone cluster in cluster deploy mode with supervise
For a Spark standalone cluster with cluster deploy mode, you can also provide –supervise. The driver restarts automatically incase of any kind of failures with a non-zero exit code.
Few applications will submit from a machine far from the local machine . It is common to use cluster mode to minimize network latency between the drivers and the executors.
spark-submit --class <class name> --master <spark://host id> --deploy-mode cluster --supervise --executor-memory 20G --total-executor-cores 100 <jar name>
Run a Python application on a Spark standalone cluster
Currently, the standalone mode does not support cluster mode for Python applications.
spark-submit --master <spark:host id> <python file>
Reference
https://spark.apache.org/docs/latest/submitting-applications.html