Today, we’ll have a look at one of the recent releases in Azure Synapse Analytics workspaces to query JSON files.
In a previous blog post, I highlighted how to query JSON files using notebooks and Apache Spark.
Table of Contents
Download the sample data file
Today, let’s take a look at how to do the same with SQL and the serverless offering.
You can download some JSON sample files from this link.
Query JSON file with Azure Synapse Analytics Serverless
Let’s begin! Go to your Data Lake and selecting the top 100 rows from your JSON file.
Then, a new window with the required script will be populated for you.
- First, select the key elements that you want to query.
- In my case, I had to delete the rowterminator to be able to query the JSON files correctly.
The final script after a few minor modifications looks like this:
SELECT
JSON_VALUE (jsonContent, '$.OrderDate') AS OrderDate
, JSON_VALUE (jsonContent, '$.TotalDue') AS TotalDue
FROM
OPENROWSET(
BULK 'https://dlsstorageaccount.dfs.core.windows.net/dlsfs/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_SalesOrderHeader_20200712.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b'
)
WITH (
jsonContent varchar(8000)
) AS [result]
Additionally, you can also start summarizing the data by converting the data type.
SELECT
JSON_VALUE (jsonContent, '$.OrderDate') AS OrderDate
, sum(cast(JSON_VALUE (jsonContent, '$.TotalDue') as float)) AS TotalDue
FROM
OPENROWSET(
BULK 'https://dlsstorageaccount.dfs.core.windows.net/dlsfs/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_SalesOrderHeader_20200712.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b'
)
WITH (
jsonContent varchar(8000)
) AS [result]
group by JSON_VALUE (jsonContent, '$.OrderDate')
Or, use the wildcards to query multiple files.
SELECT
JSON_VALUE (jsonContent, '$.OrderDate') AS OrderDate
, sum(cast(JSON_VALUE (jsonContent, '$.TotalDue') as float)) AS TotalDue
FROM
OPENROWSET(
BULK 'https://dlsstorageaccount.dfs.core.windows.net/dlsfs/Demos/AdventureWorks/json/2020/202007/202007*/SalesLT_SalesOrderHeader_202007*.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b'
)
WITH (
jsonContent varchar(8000)
) AS [result]
group by JSON_VALUE (jsonContent, '$.OrderDate')
Summary
In summary, we saw how to easily query JSON files using the serverless offering within Azure Synapse Analytics.
Final Thoughts
When I first started using Azure Synapse Analytics, I did not expect this feature to be released. I thought this capability would always be available with parquet files. It’s great to see new no-code capabilities are being brought to the service.
What’s next?
In upcoming blog posts, we’ll continue to explore some of the features within Azure Services.
Please follow Tech Talk Corner on Twitter for blog updates, virtual presentations, and more!
As always, please leave any comments or questions below.
1 Response
ankit kumar
03 . 06 . 2022How do we write a Query here where we say that my key does not have this value and time is within a range…like here OrderDate is between 3rdJune to 5th June and product field is not equal to XYZ.