createOrReplaceTempView Archives

In this post , let us learn the difference between createTempView and createGlobalTempView

createOrReplaceTempView

In Spark 2.0 , createOrReplaceTempView came into picture to replace registerTempTable. It creates or replaces an in-memory reference to the Dataframe in the form of local temporary view . Lifetime of this view is dependent to SparkSession class .

df=spark.sql("select * from table")
df.createOrReplaceTempView("ViewName")

Both createOrReplaceTempView and createTempView used for creating temporary view from the existing dataframe .

If the view already exists createOrReplaceTempView replace the existing view with the new one . Wheareas ‘already exists’ exception will be thrown for createTempView.

The following command is used for dropping the view . One another way to make the view out of scope is by shuttingdown the session using stop()

spark.catalog.dropTempView("ViewName")

Libraries required :

pyspark.sql.DataFrame.createOrReplaceTempView
pyspark.sql.DataFrame.createTempView
pyspark.sql.Catalog.dropTempView

createOrReplaceGlobalTempView

It creates references in the form of global temporary view which used across spark sessions. Life time of this view is dependent to spark application itself

df=spark.sql("select * from table")
df.createOrReplaceGlobalTempView("ViewName")

Following is the command to drop the view , or can stop() the session

spark.catalog.dropGlobalTempView("ViewName")

Libraries required

pyspark.sql.DataFrame.createOrReplaceGlobalTempView
pyspark.sql.DataFrame.createGlobalTempView
pyspark.sql.Catalog.dropGlobalTempView

Inorder to get more information on the difference between createTempView and createGlobalTempView , please refer the below URL https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.createGlobalTempView.html

#Libraries required from pyspark.sql import SparkSession from pyspark import SparkContext sc = SparkContext() spark = SparkSession(sc) from pyspark.sql import Row #creating rdd and converting to dataframe df=sc.parallelize([Row(name='Gokul',Class=10,marks=480,grade='A'),Row(name='Usha',Class=12,marks=450,grade='A'),Row(name='Rajesh',Class=12,marks=430,grade='B')]).toDF() #Registering temporary table with create dataframedf.createOrReplaceTempView("df_view") #Executing SQl queries using spark SQl operation spark.sql("select * from df_view").show()

createOrReplaceTempView

createOrReplaceGlobalTempView

Related Articles

What is spark SQL in pyspark ?

Sample program

Output

Reference