In this post, We will learn how to change the date format in pyspark
Creating dataframe
Inorder to understand this better , We will create a dataframe having date format as yyyy-MM-dd .
Note: createDataFrame – underlined letters need to be in capital
#Importing libraries required
import findspark
findspark.init()
from pyspark import SparkContext,SparkConf
from pyspark.sql.functions import *
sc=SparkContext.getOrCreate()
#creating dataframe with date column
df=spark.createDataFrame([('2019-02-28',)],['dt'])
df.show()
Output
With the above code , a dataframe named df is created with dt as one its column as below.
+----------+
| dt|
+----------+
|2019-02-28|
+----------+
Changing the format
With the dataframe created from the above code , the function date_format() is used to modify its format .
date_format(<column_name>,<format required>)
#Changing the format of the date
df.select(date_format('dt','yyyy-MM-dd').alias('new_dt')).show()
Output
Thus we convert the date format 2019-02-28
to the format 2019/02/28
+----------+
| new_dt|
+----------+
|2019/02/28|
+----------+
Reference
https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.date_format
Related Articles
how to get the current date in pyspark with example