Skip to main content

Posts

Showing posts from 2023

GROUP BY ALL - Databricks

GROUP BY ALL - Databricks

Group By All - Databricks With the release of the GROUP BY ALL syntax by databricks, the code for wrtting an aggregation query has been extremely simplified. Now we don't need to specify the non-aggregatng columns again in the GROUP BY clause. This makes the code much more cleaner and less error-prone. %sql -- Syntax - Specifying the columns in the GROUP BY clause SELECT product_category, order_date, sum(sale_amount) FROM order_details GROUP BY product_category, order_date With the "GROUP BY ALL" syntax, the code gets extremely simplified. %sql -- Syntax - Using GROUP BY ALL SELECT product_category, order_date, sum(sale_amount) FROM order_details GROUP BY ALL Now with the new Group By All , you can add more non-aggregated columns to the Select statement without worrying about adding them to the GROUP BY clause.

Free Practice Assessments for Microsoft Certifications

Free Practice Assessments for Microsoft Certifications Recently Microsoft launched free practice assessments for Microsoft Certifications. This is really great, as this will not only help the candidates to test their knowledge but also help the candidates learn about the knowledge gaps they need to work on. Above all, it will also help them save money. These free assessments will provide the candidate with learning about the style, wording and difficulty of the questions asked in the exam itself and increases the chances for the candidate to pass a certification exam. More information about the Free Practice Assessments for Microsoft Certifications can be found here . Following is a sample of the list of practice assessments currently available: Other Helpful Links: • Register and schedule an exam • Prepare for an exam • Exam duration and question types • Exam scoring and score reports • Request exam accommodations • Available exam accommodations and associated do

Lateral column alias support with Spark 3.4 and above

Lateral column alias support with Spark 3.4 and above CTEs get extensively used to implement logic. Before Spark 3.4 , one has to write multiple select statements when a computed column had to be used to drive another column. Code Sample (Spark 3.3) #Using Pyspark version 3.3 and before #chaining multiple select statements query = "WITH t AS (SELECT 100 as col1) \ SELECT 100 * col1 as col2 FROM t" df = spark.sql(query) df.display() Code Sample(Spark 3.3) #Using Pyspark version 3.3 and before #Using lateral column alias query = "Select 100 as col1, 100 * col1 as col2" df = spark.sql(query) df.display() Starting Spark 3.4 and above, we don't need to write multiple select statement but can get this accomplished with the lateral column alias Code Sample(Spark 3.4.1) #Using Pyspark version 3.4 and above #Using lateral column alias query = "Select 100 as col1, 100 * col1 as col2" df = spark.sql(query) df.display()