Categories
pyspark

how to add/subtract months to the date in pyspark

In this post, We will learn how to add/subtract months to the date in pyspark with examples.

Creating dataframe – Sample program

With the following program , we first create a dataframe df with dt as of its column populated with date value '2019-02-28'

import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql.functions import *
sc=SparkContext.getOrCreate()
#Creating a dataframe df with date column
df=spark.createDataFrame([('2019-02-28',)],['dt'])
print("Printing df below")
df.show()
Output

The dataframe is created with the date value as below .

Printing df below
+----------+
|        dt|
+----------+
|2019-02-28|
+----------+
Adding months – Sample program

In the Next step , we will create another dataframe df1 by adding  months to the column dt using add_months() 

date_format() helps us to convert the string '2019-02-28' into date by specifying the date format within the function .

You could get to know more about the date_format() from https://beginnersbug.com/how-to-change-the-date-format-in-pyspark/

#Adding the months 
df1=df.withColumn("months_add",add_months(date_format('dt','yyyy-MM-dd'),1))
print("Printing df1 below")
Output

add_months(column name , number of months ) requires two inputs – date column to be considered and the number of months to be incremented or decremented 

Printing df1 below
+----------+----------+
|        dt|months_add|
+----------+----------+
|2019-02-28|2019-03-31|
+----------+----------+
Subtracting months – Sample program

We can even decrement the months by giving the value negatively

#Subtracting the months 
df2=df.withColumn("months_sub",add_months(date_format('dt','yyyy-MM-dd'),-1))
print("Printing df2 below")
Output

Hence we get the one month back date using the same function .

Printing df2 below
+----------+----------+
|        dt|months_sub|
+----------+----------+
|2019-02-28|2019-01-31|
+----------+----------+
Reference

https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.functions.add_months

from_unixtime in pyspark with example

Leave a Reply

Your email address will not be published. Required fields are marked *