One day I was strucked with a bug when I am practicing Aggregate Transformation. After a day of frustration I searched for solution which rescue me. I think that every SSIS developer (and maybe every developer) can learn a thing or two from their mistakes.
TASK:-- The data flow takes one table with duplicate rows and copies it to another table and makes sure that every row will appear only once. In the way, the data flow also adds some irrelevant fields. Among them is the Create_User and Create_Date fields which tells by who and when the package last ran.
HOW I DID IT:-- Again, it's a very simple flow. I only added Derived Column transformation to add the new fields and then added an Aggregate transformation to make every row appear only once.
It's a sample package I did on my machine to show it here.
The Bug:-- When I am first trying this, it seemed to me very simple flow and I asked myself how this can be happening:
TASK:-- The data flow takes one table with duplicate rows and copies it to another table and makes sure that every row will appear only once. In the way, the data flow also adds some irrelevant fields. Among them is the Create_User and Create_Date fields which tells by who and when the package last ran.
HOW I DID IT:-- Again, it's a very simple flow. I only added Derived Column transformation to add the new fields and then added an Aggregate transformation to make every row appear only once.
It's a sample package I did on my machine to show it here.
The Bug:-- When I am first trying this, it seemed to me very simple flow and I asked myself how this can be happening:
As you can see, it seems that the Aggregate transformation is not deterministic. Sometimes it outputs 99 rows, sometimes 198 and in some other times I get other results as well.
Case Study: I wanted to see what's the difference between the table that I got in the first time (99 rows) and the table I got in the second time (198 rows) so I changed the destination table and compared the two tables. In this step I really started to think that maybe the Aggregate transformation has something wrong inside... Instead of blaming that, I decided to concentrate on that. I needed to see what can make the flow non-deterministic. Then, it hit me...............
DERIVED COLUMN TRANSFORMATION EDITOR |
AGGREGATE TRANSFORMATION EDITOR |
Conclusions:
- Pay attention to non-deterministic elements in what you do, whether it's code or ETL process.
- When you do dummy stuff like checking all the checkboxes in a list - think what are the outcomes.
No comments:
Post a Comment