I find that I regularly need to evaluate all the rows in "data groups" within the pipeline and execute script based transform logic for each grouping (aggregate transform does not provide script access to the data).
I do this in script by sorting the pipelined data on some key columns to define the groupings, caching the key values as the rows pass through the ProcessInputRow procedure, and running logic that compares the current row's key values as each row passes through the transform with the previous row's key values. When they differ I know I have a new group (special casing for the very first row in the pipleine), so I perform my transforms with cached data from the previous group .
This works but I am thinking this must be a common "pattern". Also, I have experience with a competing ETL tool which has "built in" support for script processing of user defined data groupings. That product provides an "OnChange" event, where the user defines the key in the GUI (can be composite), and the event provides a container to hold script (with access to the row data values) to run when the event "fires"(any values change in the key vs. the previous). Of course the data stream again has to be sorted for this to work.
Anyone else regularly have a need to do this type of processing? If so perhaps it is a good candidate for a custom component? Or a suggestion to Microsoft to enhance the aggregate component for the next SSIS version to provide a script "hook" in the aggregate transform? Anyone else regularly have this type of processing requirement?
Ken
Yeah, this would be a useful feature, but it looks like someone will have to write a custom component!
No comments:
Post a Comment