As Azure Data Factory matures year after year, I think that it’s fair to compare Azure Data Factory and SQL Server Integration Services (SSIS). This blog post considers people, processes and technical characteristics when looking at Azure Data Factory vs SSIS.
Table of Contents
What is Azure Data Factory?
To being, Azure Data Factory (ADF) helps you design your data movement solutions for on-premises or any cloud provider. It is a serverless offering that allows you to perform enterprise data movements and transformations. You can do this while governing Azure Data Factory using the Azure ecosystem.
If you want to know more, check out this Azure Data Factory Introduction.
What is SQL Server Integration Services?
SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. It comes as part of the SQL Server licensing offering. You can host it on-premises, in a VM that you manage (IaaS) or deployed as part of Azure Data Factory in a VM that Microsoft manages for you (PaaS).
Azure Data Factory vs SSIS
Both Azure Data Factory and ISS have similar descriptions. This is because the two tools were created with the same purpose. To understand the differences, let’s compare ADF and SSIS.
The following table compares Azure Data Factory and SSIS, taking into account people, processes and technology characteristics.
People / Processes / Technology | Azure Data Factory | SSIS (On-Premises or IaaS) |
Training available | Free training available fom Microsoft and community resources growing every day | A lot of free and private training available |
Likelihood you’ll enjoy working with it | High | Low |
Likelihood of finding developers in the market | High | High |
Easy to learn | Yes | Yes |
Easy to master | Yes, but it involves making sure you are an expert in the pricing model | No |
Salary ranges as a developer | Increasing | Decreasing due to decrease of offering |
Governance | Fully integrated with Azure Purview and Azure logs | Integrated with SQL Server logs and SSISDB |
Release cycle | New features every month | Every 2 years (with each version of SQL Server) |
Microsoft Support | It depends on your Azure support level, but the support is extremely responsive | Legacy process, and it might require Software Assurance |
Development tool | Azure Portal or APIs | Visual Studio |
Integrated with Azure DevOps for CI/CD | Yes | Yes |
Code version control | Yes | Yes |
Future proof | Yes | No |
Volume | From megabytes to terabytes | From megabytes to gigabytes |
Velocity | Any speed can be achieved by integrating with other services such as Azure Databricks | Mainly batch processing |
Variety | From CSV to Json, parquet or new file formats | It does not integrate with new file formats like Json or parquet |
Release date | 2015 | 2005 |
Platform fully managed by Microsoft | Yes | No |
Pricing model | Pay as you go, it might be difficult to understand and optimize. | It comes as part of an SQL Server license |
Development environment pricing | You will be billed as part of the pay as you go offering | Yes (SQL Server Developer Edition) |
Number of connectors | More than 85 | Around 7, it can be extended by getting third-party provider extensions (additional costs) |
Primary design pattern | ELT (pipelines) and ETL (data flows) | ETL |
Drag and drop development available | Yes | Yes |
Development options | Pipelines, mapping data flows, Power Query (Power Query) and you can execute SSIS workloads in ADF | Only SSIS |
Parallelism by default | Yes | No |
Alerts | Built in with Azure (Email / SMS / Calls), it requires additional resources if you want to send detailed messages | Built in component to send email alerts. It allows you to customize the message |
Definition language | Json | XML |
Available to use Azure Synapse Analytics Studio | Yes | No |
Integrated with Azure Databricks | Yes | No |
Integrated with Big Data and ML workloads | Yes | No |
Maturity state | Still maturing, more features and capabilities to be released | After 16 years on the market, fully mature, cannot expect many more capabilities to be released |
Data movement customization | There are features like defining how many Data Integration Units to use, batch size or how many activities to run in parallel but it is challenging to troubleshoot. | You have detailed control of the number of rows in each batch or the size in KB of each row which provides a great level of control on the data movement |
Summary
By looking at the comparison above, I suggest exploring Azure Data Factory as your first option for a data movement orchestration tool.
If you are happy with your current workloads in SQL Server Integration Services, don’t forget that you can still execute them using Azure Data Factory.
What’s Next?
In upcoming blog posts, we’ll continue to explore some of the features within Azure Data Services.
Please follow Tech Talk Corner on Twitter for blog updates, virtual presentations, and more!
As always, please leave any comments or questions below.