Lateral column alias support with Spark 3.4 and above

CTEs get extensively used to implement logic. Before Spark 3.4, one has to write multiple select statements when a computed column had to be used to drive another column.

Code Sample (Spark 3.3)

     
#Using Pyspark version 3.3 and before
#chaining multiple select statements
query = "WITH t AS (SELECT 100 as col1) \
        SELECT 100 * col1 as col2 FROM t"
df = spark.sql(query)
df.display()

Code Sample(Spark 3.3)

     
#Using Pyspark version 3.3 and before
#Using lateral column alias
query = "Select 100 as col1, 100 * col1 as col2"
df = spark.sql(query)
df.display()

Starting Spark 3.4 and above, we don't need to write multiple select statement but can get this accomplished with the lateral column alias

Code Sample(Spark 3.4.1)

     
#Using Pyspark version 3.4 and above
#Using lateral column alias
query = "Select 100 as col1, 100 * col1 as col2"
df = spark.sql(query)
df.display()

Akhil Mahajan

Search This Blog

Agentic AI/ AI agents

Lateral column alias support with Spark 3.4 and above

Lateral column alias support with Spark 3.4 and above

Labels

Comments

Post a Comment