In this post, let us learn about the difference between map and flatmap in pyspark.
What is the difference between Map and Flatmap?
Map and Flatmap are the transformation operations available in pyspark.
The map takes one input element from the RDD and results with one output element. The number of input elements will be equal to the number of output elements.
In the case of Flatmap transformation, the number of elements will not be equal. That is the difference between the two.
Let the below example clarify it clearly.
How to create an RDD ?
With the below part of the code, an RDD is created using parallelize method and its value is viewed.
Let us discuss the topic below with the created RDD.
# Creating RDD using parallelize method
rdd1=sc.parallelize([1,2,3,4])
rdd1.collect()
The RDD contains the following 4 elements.
[1, 2, 3, 4]
How to apply map transformation ?
# Applying map transformation
rdd1_map=rdd1.map(lambda x : x**2)
# Viewing the result
rdd1_map.collect()
In the below result , the output elements are the square of the input elements. And also the count is equal.
[1, 4, 9, 16]
How to apply flatMap transformation ?
# Applying flatmap transformation
rdd1_second=rdd1.flatMap(lambda x : (x**1,x**2))
# Viewing the result
rdd1_second.collect()
In the below result, we are not finding an equal number of elements as map transformation.
[1, 1, 2, 4, 3, 9, 4, 16]
Related Articles
https://beginnersbug.com/transformation-and-action-in-pyspark/