Categories
pyspark

case when statement in pyspark with example

In this tutorial , We will learn about case when statement in pyspark with example

case when statement in pyspark with example

Syntax

The case when statement in pyspark should start with the keyword <case> . We need to specify the conditions under the keyword <when> .

The output should give under the keyword <then>  . Also this will follow up with keyword <else> in case of condition failure.

The keyword <end> for ending up the case statement .

expr("case when {condition} then {Result} else {Result} end")

Libraries

We required the following libraries to be added before executing our code .

import findspark
from pyspark.sql import Row
from pyspark import SparkContext , SparkConf
from pyspark.sql.functions import expr

Sample program

This program helps us to understand the usage of case when statement. Before that we need a dataframe inorder to apply case statements .

Here df is the dataframe, which maintains the name,class,marks,grade details of 3 members.

findspark.init()
sc = SparkContext.getOrCreate()
df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B')]).toDF()
print("Printing the df")
df.show()
df1=df.withColumn("Level",expr("case when grade='A' then 1 else 0 end"))
print("Printing the df1")
df1.show()
print("Printing the df2")
df2=df.withColumn("status",expr("case when grade='A' then 'yes' else 'no' end"))
df2.show()

Result

df is the source dataframe which we created earlier .

df1 and df2 are dataframes created by applying the case statements.

Printing the df
+-----+-----+-----+------+
|Class|grade|marks|  name|
+-----+-----+-----+------+
|   10|    A|  480| Gokul|
|   12|    A|  450|  Usha|
|   12|    B|  430|Rajesh|
+-----+-----+-----+------+

Printing the df1
+-----+-----+-----+------+-----+
|Class|grade|marks|  name|Level|
+-----+-----+-----+------+-----+
|   10|    A|  480| Gokul|    1|
|   12|    A|  450|  Usha|    1|
|   12|    B|  430|Rajesh|    0|
+-----+-----+-----+------+-----+

Printing the df2
+-----+-----+-----+------+------+
|Class|grade|marks|  name|status|
+-----+-----+-----+------+------+
|   10|    A|  480| Gokul|   yes|
|   12|    A|  450|  Usha|   yes|
|   12|    B|  430|Rajesh|    no|
+-----+-----+-----+------+------+
Reference

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.Column.otherwise