In this tutorial , We will learn about case when statement in pyspark with example
Syntax
The case when statement in pyspark should start with the keyword <case> . We need to specify the conditions under the keyword <when> .
The output should give under the keyword <then> . Also this will follow up with keyword <else> in case of condition failure.
The keyword <end> for ending up the case statement .
expr("case when {condition} then {Result} else {Result} end")
Libraries
We required the following libraries to be added before executing our code .
import findspark
from pyspark.sql import Row
from pyspark import SparkContext , SparkConf
from pyspark.sql.functions import expr
Sample program
This program helps us to understand the usage of case when statement. Before that we need a dataframe inorder to apply case statements .
Here df is the dataframe, which maintains the name,class,marks,grade details of 3 members.
findspark.init()
sc = SparkContext.getOrCreate()
df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B')]).toDF()
print("Printing the df")
df.show()
df1=df.withColumn("Level",expr("case when grade='A' then 1 else 0 end"))
print("Printing the df1")
df1.show()
print("Printing the df2")
df2=df.withColumn("status",expr("case when grade='A' then 'yes' else 'no' end"))
df2.show()
Result
df is the source dataframe which we created earlier .
df1 and df2 are dataframes created by applying the case statements.
Printing the df
+-----+-----+-----+------+
|Class|grade|marks| name|
+-----+-----+-----+------+
| 10| A| 480| Gokul|
| 12| A| 450| Usha|
| 12| B| 430|Rajesh|
+-----+-----+-----+------+
Printing the df1
+-----+-----+-----+------+-----+
|Class|grade|marks| name|Level|
+-----+-----+-----+------+-----+
| 10| A| 480| Gokul| 1|
| 12| A| 450| Usha| 1|
| 12| B| 430|Rajesh| 0|
+-----+-----+-----+------+-----+
Printing the df2
+-----+-----+-----+------+------+
|Class|grade|marks| name|status|
+-----+-----+-----+------+------+
| 10| A| 480| Gokul| yes|
| 12| A| 450| Usha| yes|
| 12| B| 430|Rajesh| no|
+-----+-----+-----+------+------+
Reference
https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.Column.otherwise