Actionable Data Analytics
Join Our Email List for Data News Sent to Your Inbox

No-code Experience for Querying JSON Files in Azure Synapse Analytics Serverless

Today, we’ll have a look at one of the recent releases in Azure Synapse Analytics workspaces to query JSON files. 

In a previous blog post, I highlighted how to query JSON files using notebooks and Apache Spark.

Download the sample data file

Today, let’s take a look at how to do the same with SQL and the serverless offering. 

You can download some JSON sample files from this link

Query JSON file with Azure Synapse Analytics Serverless

Let’s begin! Go to your Data Lake and selecting the top 100 rows from your JSON file. 

Top 100 rows of JSON file

Then, a new window with the required script will be populated for you. 

New window with required script
  1. First, select the key elements that you want to query. 
  1. In my case, I had to delete the rowterminator to be able to query the JSON files correctly. 

The final script after a few minor modifications looks like this: 

SELECT 

    JSON_VALUE (jsonContent, '$.OrderDate') AS OrderDate 

    , JSON_VALUE (jsonContent, '$.TotalDue') AS TotalDue 

FROM 

    OPENROWSET( 

        BULK 'https://dlsstorageaccount.dfs.core.windows.net/dlsfs/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_SalesOrderHeader_20200712.json', 

        FORMAT = 'CSV', 

        FIELDQUOTE = '0x0b', 

        FIELDTERMINATOR ='0x0b' 

    ) 

    WITH ( 

        jsonContent varchar(8000) 

    ) AS [result] 

 
Code block

Additionally, you can also start summarizing the data by converting the data type. 

SELECT 

    JSON_VALUE (jsonContent, '$.OrderDate') AS OrderDate 

    , sum(cast(JSON_VALUE (jsonContent, '$.TotalDue') as float)) AS TotalDue 

FROM 

    OPENROWSET( 

        BULK 'https://dlsstorageaccount.dfs.core.windows.net/dlsfs/Demos/AdventureWorks/json/2020/202007/20200712/SalesLT_SalesOrderHeader_20200712.json', 

        FORMAT = 'CSV', 

        FIELDQUOTE = '0x0b', 

        FIELDTERMINATOR ='0x0b' 

    ) 

    WITH ( 

        jsonContent varchar(8000) 

    ) AS [result] 

group by  JSON_VALUE (jsonContent, '$.OrderDate') 
Convert data type

Or, use the wildcards to query multiple files. 

SELECT 

    JSON_VALUE (jsonContent, '$.OrderDate') AS OrderDate 

    , sum(cast(JSON_VALUE (jsonContent, '$.TotalDue') as float)) AS TotalDue 

FROM 

    OPENROWSET( 

        BULK 'https://dlsstorageaccount.dfs.core.windows.net/dlsfs/Demos/AdventureWorks/json/2020/202007/202007*/SalesLT_SalesOrderHeader_202007*.json', 

        FORMAT = 'CSV', 

        FIELDQUOTE = '0x0b', 

        FIELDTERMINATOR ='0x0b' 

    ) 

    WITH ( 

        jsonContent varchar(8000) 

    ) AS [result] 

group by  JSON_VALUE (jsonContent, '$.OrderDate') 
Use wildcards to query multiple files

Summary 

In summary, we saw how to easily query JSON files using the serverless offering within Azure Synapse Analytics. 

Final Thoughts 

When I first started using Azure Synapse Analytics, I did not expect this feature to be released. I thought this capability would always be available with parquet files. It’s great to see new no-code capabilities are being brought to the service. 

What’s next? 

In upcoming blog posts, we’ll continue to explore some of the features within Azure Services.    

Please follow Tech Talk Corner on Twitter for blog updates, virtual presentations, and more!  

As always, please leave any comments or questions below.  

Check out my other posts

comment [ 1 ]
share
No tags 0
1 Response
  • ankit kumar
    03 . 06 . 2022

    How do we write a Query here where we say that my key does not have this value and time is within a range…like here OrderDate is between 3rdJune to 5th June and product field is not equal to XYZ.

Do you want to leave a comment?

Your email address will not be published. Required fields are marked *

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

I agree to these terms.