Skip to main content

Posts

Showing posts from July, 2023

GROUP BY ALL - Databricks

Lateral column alias support with Spark 3.4 and above

Lateral column alias support with Spark 3.4 and above CTEs get extensively used to implement logic. Before Spark 3.4 , one has to write multiple select statements when a computed column had to be used to drive another column. Code Sample (Spark 3.3) #Using Pyspark version 3.3 and before #chaining multiple select statements query = "WITH t AS (SELECT 100 as col1) \ SELECT 100 * col1 as col2 FROM t" df = spark.sql(query) df.display() Code Sample(Spark 3.3) #Using Pyspark version 3.3 and before #Using lateral column alias query = "Select 100 as col1, 100 * col1 as col2" df = spark.sql(query) df.display() Starting Spark 3.4 and above, we don't need to write multiple select statement but can get this accomplished with the lateral column alias Code Sample(Spark 3.4.1) #Using Pyspark version 3.4 and above #Using lateral column alias query = "Select 100 as col1, 100 * col1 as col2" df = spark.sql(query) df.display()