Categories
pyspark

How to change the date format in pyspark

In this post, We will learn how to change the date format in pyspark

Creating dataframe

Inorder to understand this better , We will create a dataframe having date  format as yyyy-MM-dd  .

Note: createDataFrame – underlined letters need to be in capital

#Importing libraries required
import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql.functions import *

sc=SparkContext.getOrCreate()
#creating dataframe with date column
df=spark.createDataFrame([('2019-02-28',)],['dt'])
df.show()
Output

With the above code ,  a dataframe named df is created with dt as one its column as below.

+----------+
|        dt|
+----------+
|2019-02-28|
+----------+
Changing the format

With the dataframe created from  the above code , the function date_format() is used to modify its format .

date_format(<column_name>,<format required>)

#Changing the format of the date
df.select(date_format('dt','yyyy-MM-dd').alias('new_dt')).show()
Output

Thus we convert the date format  2019-02-28 to the format 2019/02/28

+----------+
|    new_dt|
+----------+
|2019/02/28|
+----------+
Reference

https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.date_format

how to get the current date in pyspark with example

Leave a Reply

Your email address will not be published. Required fields are marked *