In this post, let us learn about transformation and action in pyspark.
Transformation
Transformation is one of the operations available in pyspark.
This helps in creating a new RDD from the existing RDD.
Types of transformation
Narrow transformation :
map,filter,flatmap,distinct,sample,union,intersection,join,coalesce,repartition,pipe,cartesian
Wide transformation :
groupByKey,reduceByKey,aggregateByKey,sortByKey
What is action ?
On applying the transformation, DAG(Directed Acyclic Graph) is usually created. And this develops on further application of some other operations.
But the operations will execute only if action is called upon.
Types of action
reduce,collect,take,head,count,first,saveAsObjectFile,countByKey,foreach,saveAsSequenceFile,saveAsTextFile,takeOrdered,takeSample
Sample program
The following program helps us to filter elements based on some conditions.
But the steps execute only at the collect function.
from pyspark.sql import SparkSession
from pyspark import SparkContext
sc = SparkContext()
spark = SparkSession(sc)
rdd1=sc.parallelize([1,2,3,4])
rdd1_first=rdd1.filter(lambda x : x<3)
rdd1_first.collect()
[1, 2]
Related Articles
https://beginnersbug.com/rank-and-dense-rank-in-pyspark-dataframe/
https://beginnersbug.com/window-function-in-pyspark-with-example/