Aggregator Methods in DataStage (Courtesy: Ray Wurlod)

Here’s another favourite from DSXchange by Ray Wurlod






Hash aggregation builds a hash table of results in memory. There is one row in this table for each combination of grouping column values. The table is built entirely in memory, so is useful when there are not too many distinct combinations of grouping column values. Sort aggregation uses the fact that the data are sorted (and partitioned) by the grouping key columns. This means that, when a different value arrives, it is known that the previous value will never be seen again (this being the nature of sorted data) and therefore the results for the previous combination can be transferred to the output, and the memory occupied freed. It can be seen that, in Sort mode, the Aggregator only blocks one combination of values per node, not the entire data stream. Therefore Sort mode can be used with any volume of data.

Posted in Technology by JeromeTags: datastage ETL

Write a comment