Query JSON Files with Azure Synapse Analytics Notebooks

17 . 07 . 202019 . 01 . 2021

In this blog post, I’ll show you how to easily query JSON files with Notebooks by converting them to temporal tables in Apache Spark and using Spark SQL.

With the appearance of Data Lakes and other file formats in the data analytics space, people are curious about how to consume these new dataset formats. While a highly skilled technical resource can do it, you can also use Azure Synapse Analytics to get insights with a no-code experience or by writing only a few lines of code.

Table of Contents

Querying JSON Files Tutorial

In this tutorial, we are going to be querying some files from the Adventure Works LT database. You can download the files and get a copy of the notebook.

%%pyspark 

## Customers 

customers_data_path = spark.read.load([ 

    'abfss://[email protected]/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_Customer_20200712.json' 

    ], format='json') 

customers_data_path.createOrReplaceTempView("Customer") 

 

## Customers Address 

customers_data_path = spark.read.load([ 

    'abfss://[email protected]/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_CustomerAddress_20200712.json' 

    ], format='json') 

customers_data_path.createOrReplaceTempView("CustomerAddress") 

 

## Address 

customers_data_path = spark.read.load([ 

    'abfss://[email protected]/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_Address_20200712.json' 

    ], format='json') 

customers_data_path.createOrReplaceTempView("Address")

Next, after loading the files into temporal views, query them using Spark SQL.

%%sql    

select CountryRegion, count(*) as NumberOfCustomers  

from Customer CUS 

join CustomerAddress CAD 

    on CUS.CustomerID = CAD.CustomerID 

join Address ADR 

    on ADR.AddressID = CAD.AddressID 

group by CountryRegion  

order by count(*) desc

Is that it? Yes, that’s all! Now you are ready to start consuming your JSON files.

What if you have multiple files?

Query multiple files at the same time by using the wildcard such as *. For example:

Summary

To sum up, querying JSON files is not a challenging on this platform thanks to Notebooks. This enables data engineers, data scientists and data analysts to easily consume information using different file formats. In my opinion, it’s certainly one of the best features of Azure Synapse Analytics.

What’s next?

Looking forward, I’ll continue introducing new features that are already available in Azure Synapse Analytics Workspaces. Please leave me a comment if you have any questions.

My previous post: Azure Synapse Analytics Notebooks.

Another post on Azure Synapse Analytics: Create Parquet Files

comment [ 0 ]

No tags 0

David Alzamendi

As a Data Architect, I help organisations to adopt Azure data analytics technologies that mitigate some of their business challenges. I’ve been working in the data analytics space since 2011, mainly in the data warehousing area and I’m specialized in the design and implementation of data analytics solutions with Microsoft technologies. I am responsible for providing end-to-end technical guidance and expertise across multiple data analytics projects.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Query JSON Files with Azure Synapse Analytics Notebooks

Querying JSON Files Tutorial

What if you have multiple files?

Summary

What’s next?

David Alzamendi

Azure Synapse Analytics Notebooks

Documenting Azure Data Factory

Do you want to leave a comment? Cancel reply

Recent Posts

Categories

David Alzamendi

Calendar

Archives