In this post, we will learn how to use filter condition in pyspark with example.
Sample program using filter condition
We will create a dataframe using the following sample program.
Then we filter the dataframe based on marks and store the result in another dataframe.
The following classes imported at the beginning of the code.
import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql import Row
from pyspark.sql.functions import *
sc=SparkContext.getOrCreate()
#creating dataframe with three records
df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B')]).toDF()
print("Printing df dataframe below ")
df.show()
#Filtering based on the marks
df1=df.filter(col("marks")==480)
print("Printing df1 dataframe below")
df1.show()
Output
The following dataframes created as the result of the above sample program.
Here this filter condition helps us to filter the records having marks as 480 from the dataframe.
Printing df dataframe below
+-----+-----+-----+------+
|Class|grade|marks| name|
+-----+-----+-----+------+
| 10| A| 480| Gokul|
| 12| A| 450| Usha|
| 12| B| 430|Rajesh|
+-----+-----+-----+------+
Printing df1 dataframe below
+-----+-----+-----+-----+
|Class|grade|marks| name|
+-----+-----+-----+-----+
| 10| A| 480|Gokul|
+-----+-----+-----+-----+
Must use double equal to inside the filter condition.
I hope that everyone got an idea about how to use filter condition in pyspark now.
Reference
Related Articles
https://beginnersbug.com/where-condition-in-pyspark-with-example/