Skip to main content

Posts

GROUP BY ALL - Databricks

Recent posts

Free Practice Assessments for Microsoft Certifications

Free Practice Assessments for Microsoft Certifications Recently Microsoft launched free practice assessments for Microsoft Certifications. This is really great, as this will not only help the candidates to test their knowledge but also help the candidates learn about the knowledge gaps they need to work on. Above all, it will also help them save money. These free assessments will provide the candidate with learning about the style, wording and difficulty of the questions asked in the exam itself and increases the chances for the candidate to pass a certification exam. More information about the Free Practice Assessments for Microsoft Certifications can be found here . Following is a sample of the list of practice assessments currently available: Other Helpful Links: • Register and schedule an exam • Prepare for an exam • Exam duration and question types • Exam scoring and score reports • Request exam accommodations • Available exam accommodations and associated do

Lateral column alias support with Spark 3.4 and above

Lateral column alias support with Spark 3.4 and above CTEs get extensively used to implement logic. Before Spark 3.4 , one has to write multiple select statements when a computed column had to be used to drive another column. Code Sample (Spark 3.3) #Using Pyspark version 3.3 and before #chaining multiple select statements query = "WITH t AS (SELECT 100 as col1) \ SELECT 100 * col1 as col2 FROM t" df = spark.sql(query) df.display() Code Sample(Spark 3.3) #Using Pyspark version 3.3 and before #Using lateral column alias query = "Select 100 as col1, 100 * col1 as col2" df = spark.sql(query) df.display() Starting Spark 3.4 and above, we don't need to write multiple select statement but can get this accomplished with the lateral column alias Code Sample(Spark 3.4.1) #Using Pyspark version 3.4 and above #Using lateral column alias query = "Select 100 as col1, 100 * col1 as col2" df = spark.sql(query) df.display()

Using widgets in SQL notebooks in Azure Databricks

Using widgets in SQL notebooks in  Azure Databricks In this article we will see how data engineers/data analysts using SQL Notebooks in Azure Databricks can use widgets to parameterize their notebooks. Adding parameters to the queries make the queries dynamic, helping users to use the same query to drive results as per their needs. The support of parameters in Azure Databricks is offered via widgets .  Input widgets allow users to add parameters to their notebooks and dashboards. The widget API consists of calls to create various types of input widgets, remove them, and get bound values. Widgets are best for: Building a notebook or dashboard that is re-executed with different parameters Quickly exploring results of a single query with different parameters Currently there are 4 types of widgets — text : accepts string characters dropdown : provides a list of values combobox : provides a combination of string characters and dropdown. Basically, you have the option to either choose from

PySpark col should be Column Error

PySpark col should be Column Error While coding transformations as part of the Data Engineering process, it is a common practice to create new columns based on the existing columns in a data frame. One of the most common command used is withColumn df = df.select("col1","col2","col3").withColumn("col4", "col1") In the above code df is an existing dataframe from which we are selecting 3 columns ( col1,col2,col3 ) but at the same time creating a new column col4 from an existing column col1 . On executing the command we get the following error :- col should be Column The Reason we get this error is because the command df.select returns a DataFrame, and in this case we are trying to add a new column. Using .withColumn requires the second argument to be a column The updated code to resolve the error. df = df.select("col1","col2","col3").withColumn("col4", df['col1'

Using SQL to analyze data in Azure Data Lake (ADLS Gen 2)

  Using SQL to analyze data in Azure Data Lake (ADLS Gen 2) As more and more data is ingested and made available in data lakes, there is a growing demand from data analysts to be able to quickly access the data and drive insights. The biggest reason fueling this demand is the ability to use existing skills like " SQL " to be able to analyze available data. In Azure, following are few of the available in a Big Data world, to make the data available to data analysts:-      1. Mounting Azure Data Lake (ADLS Gen 2) to Databricks.     2. Implementing Databricks Lakehouse pattern, built on top of Delta tables which are available as tables in Databricks.     3. Loading data in Azure Synapse Workspace. One good thing regarding the above mentioned options is that; all the options enable the data analyst to use there existing " SQL " skills to analyze the data and give them the flexibility to build solutions and products as quickly as possible and in turn enables " citiz