Skip to main content

GROUP BY ALL - Databricks

Lateral column alias support with Spark 3.4 and above

Lateral column alias support with Spark 3.4 and above


CTEs get extensively used to implement logic. Before Spark 3.4, one has to write multiple select statements when a computed column had to be used to drive another column.

Code Sample (Spark 3.3)
     
#Using Pyspark version 3.3 and before
#chaining multiple select statements
query = "WITH t AS (SELECT 100 as col1) \
        SELECT 100 * col1 as col2 FROM t"
df = spark.sql(query)
df.display()


Code Sample(Spark 3.3)
     
#Using Pyspark version 3.3 and before
#Using lateral column alias
query = "Select 100 as col1, 100 * col1 as col2"
df = spark.sql(query)
df.display()



Starting Spark 3.4 and above, we don't need to write multiple select statement but can get this accomplished with the lateral column alias

Code Sample(Spark 3.4.1)
     
#Using Pyspark version 3.4 and above
#Using lateral column alias
query = "Select 100 as col1, 100 * col1 as col2"
df = spark.sql(query)
df.display()

Comments