Have you ever created a data movement design that had to verify data integrity of a file or table using checksums? Did you need to or make sure that all the rows were loaded when working with tables? This is a common request when designing data movement frameworks. Azure Data Factory makes it easier for us because it has a built-in Data Consistency Verification that you can use.
In this post, you will learn about Data Consistency Verification and how to enable and monitor it.
What is Data Consistency Verification in Azure Data Factory?
To begin, data consistency verification performs the following validations for you:
- When copying tables, it validates that the number of rows in the source system is the same as the destination system
- When copying files, it validates file size, last modified date and MD5 checksum
However, please take into account the current limitations. You can check them here. Also, be aware that it will affect the performance of the copy activity because it needs to perform a few more checks.
The states to know about this verification:
- Verified: the verification has been performed correctly between the source and destination system. If the verification detects differences and you have not enabled Fault tolerance, the copy activity will fail.
- NotVerified: the copy activity has not been verified to be consistent because you have not enabled the validateDataConsistency in copy activity.
- Unsupported: the verification cannot be performed because it is not supported by the current configuration (source or destination system). Enable Azure Data Factory Data Consistency Verification
First, enable the feature within the copy activity, in the settings sections.
Is that all? Yes, you don’t need to configure anything else. You can also use the JSON definition in the copy activity section to enable it.
Monitor Azure Data Factory Data Consistency Verification
Finally, after triggering the pipeline, you can see additional information in your pipeline output logs. You’ll also get error information if you enable Copy Activity Logs. To know more about Copy Activity logs, check this blog post on Azure Data Factory Copy Activity Logs.
The output in the summary.
The output in JSON format.
In summary, today you’ve learned how to enable data consistency verification within the copy activity with only one click and without writing a single line of code. If you have any questions, please leave a comment below.
In upcoming blog posts, we’ll continue to explore Azure Data Service features.
Please follow Tech Talk Corner on Twitter for blog updates, virtual presentations, and more!
Check out these other blog posts
Mike05 . 08 . 2022
“Copy Data” activity, Consistency Verification
I want to see the results of a failed verification
I couldn’t get it to fail
I selected a source SQL table that would take about a minute to copy
I truncated the sink table
I started the pipeline
While the copy was in progress, I used a query window to delete some rows
The query window’s results showed that it succeeded in deleting rows
When the pipeline finished, the copy activity showed success & that verification was good
I suspect that the copying was done in a transaction & the final row count test was still inside that transaction
* How do I force a failure for testing?
* If the copying completes, but verification fails, does execution continue on the SUCCESS path or switch to the FAIL path?
I’d expect it to switch to the FAIL path, but I can’t find that explicitly in the documentation and I can’t find a way to test it