In this post , We will learn about When otherwise in pyspark with examples
when otherwise used as a condition statements like if else statement
In below examples we will learn with single,multiple & logic conditions
Sample program – Single condition check
In Below example, df is a dataframe with three records .
df1 is a new dataframe created from df by adding one more column named as First_Level .
import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql import Row
from pyspark.sql.functions import *
sc=SparkContext.getOrCreate()
#creating dataframe with three records
df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B')]).toDF()
print("Printing df dataframe below ")
df.show()
df1=df.withColumn("First_Level",when(col("grade") =='A',"Good").otherwise("Average"))
print("Printing df1 dataframe below ")
df1.show()
Output
print("printing df")
+-----+-----+-----+------+
|Class|grade|marks| name|
+-----+-----+-----+------+
| 10| A| 480| Gokul|
| 12| A| 450| Usha|
| 12| B| 430|Rajesh|
+-----+-----+-----+------+
print("printing df1")
+-----+-----+-----+------+-----------+
|Class|grade|marks| name|First_Level|
+-----+-----+-----+------+-----------+
| 10| A| 480| Gokul| Good|
| 12| A| 450| Usha| Good|
| 12| B| 430|Rajesh| Average|
+-----+-----+-----+------+-----------+
Sample program – Multiple checks
We can check multiple conditions using when otherwise as like below
import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql import Row
from pyspark.sql.functions import *
sc=SparkContext.getOrCreate()
#creating dataframe with three records
df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B')]).toDF()
print("Printing df dataframe below")
df.show()
#In below line we are using multiple condition
df2=df.withColumn("Second_Level",when(col("grade") == 'A','Excellent').when(col("grade") == 'B','Good').otherwise("Average"))
print("Printing df2 dataframe below")
df2.show()
Output
The column Second_Level is created from the above program
Printing df dataframe below
+-----+-----+-----+------+
|Class|grade|marks| name|
+-----+-----+-----+------+
| 10| A| 480| Gokul|
| 12| A| 450| Usha|
| 12| B| 430|Rajesh|
+-----+-----+-----+------+
Printing df2 dataframe below
+-----+-----+-----+------+------------+
|Class|grade|marks| name|Second_Level|
+-----+-----+-----+------+------------+
| 10| A| 480| Gokul| Excellent|
| 12| A| 450| Usha| Excellent|
| 12| B| 430|Rajesh| Good|
+-----+-----+-----+------+------------+
Sample program with logical operators & and |
Logical operators & (AND) , |(OR) is used in when otherwise as like below .
import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql import Row
from pyspark.sql.functions import *
sc=SparkContext.getOrCreate()
#creating dataframe with three records
df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B'),Row(name='Mahi',Class=5,marks=350,grade='C')]).toDF()
print("Printing df dataframe")
df.show()
# In below line we are using logical operators
df3=df.withColumn("Third_Level",when((col("grade") =='A') | (col("Marks") > 450) ,"Excellent").when((col("grade") =='B') | ((col("Marks") > 400) & (col("Marks") < 450)),"Good").otherwise("Average") )
print("Printing df3 dataframe ")
df3.show()
Output
Printing df dataframe
+-----+-----+-----+------+
|Class|grade|marks| name|
+-----+-----+-----+------+
| 10| A| 480| Gokul|
| 12| A| 450| Usha|
| 12| B| 430|Rajesh|
| 5| C| 350| Mahi|
+-----+-----+-----+------+
Printing df3 dataframe
+-----+-----+-----+------+-----------+
|Class|grade|marks| name|Third_Level|
+-----+-----+-----+------+-----------+
| 10| A| 480| Gokul| Excellent|
| 12| A| 450| Usha| Excellent|
| 12| B| 430|Rajesh| Good|
| 5| C| 350| Mahi| Average|
+-----+-----+-----+------+-----------+
Reference
https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.when
Related Articles
case when statement in pyspark with example