Jerome Rajan

0 %
Jerome Rajan
Staff Solutions Consultant at Google
Data & Analytics
  • Residence:
    India
  • City:
    Mumbai
SQL
Dataproc, EMR
Hadoop
BigQuery
AWS Glue
PySpark, Python
Data Pipeline Design
Tableau, Redshift, Snowflake
IBM DataStage
  • AWS Lambda, S3, EMR, SQS, DynamoDB, Step Functions, Cloud Functions
  • Unix Shell Scripting, Python
  • Oracle, DB2, Redis
  • Alteryx, VBA, Blueprism, UiPath
English
Tamil
Hindi
Malayalam
Marathi

Business Alert & Exception Data Management within an ETL Ecosystem

February 25, 2018

Introduction

Data transparency and trace-ability are critical to the success of any ETL flow. The business or the technical support team needs to stay updated about potential data outliers. Outliers could be in the form of:

  1. Data semantic errors
  2. Data not meeting functional constraints
  3. Unexpected data.
  4. New data not matching internal master data

Besides data transparency, the stakeholders also need to be on top of the process executions on a day to day basis. For e.g., if a process aborts due to whatever reason, the support team needs to be notified immediately about the situation through an email.

Many ETL flows incorporate such email alerts in an ad-hoc manner such that the process would send an email as soon as it encounters an issue. But these alerts are in no way regulated or archived for future reference. A data issue could have been alerted via email to a certain Mr. X. But tomorrow, if Mr. X were replaced with Mr. Y, Mr. Y would have no way to know if the data issue is present or not.

Hundreds of alerts can go out within a small time-frame and it is essential that such alerts be treated with the same care as the actual data.

I’ve conceptualized and implemented a framework that ensures complete management of email metadata by allowing the recipient list, email subject and body to be driven through tables. This reusable asset aims to centralize alert management and embed trace-ability by ensuring that alerts are logged, audited and transparent.

A typical ETL flow

What this framework aims to do

Business Case

The supply chain division of a global organization has adopted the strategy of Vendor Managed Inventory to ensure that all its customers are well stocked and that its products are able to meet the seasonal ups and downs of demand and supply. To achieve optimum supply chain balance, the company’s customers send point of sale data about their inventory on a daily basis. This information consists of new customers, new products, legacy products, threshold values of critical quantity metrics, etc.

The business typically uses all these metrics through BI reports that tell them about certain outliers in regular patterns. It gives them information on stock levels, order levels, how a product is performing, seasonal peaks and slides, etc.

The source data comes from more than 1000 customers who could be retailers, wholesalers or distribution centers. These customers are spread across different geographies. The supply chain division needs to keep a tab on the activities of each of these customers at a product level. The business wanted to be kept informed about

  1. New products that have entered the market – This could be new products or a new way in which a certain product has been packaged.
  2. New customers
  3. Legacy products – If a customer is sending inventory data for a product that is considered legacy/inactive
  4. Invalid data – Data that doesn’t sync up with the organization’s internal master
  5. Missing information – The data sent by the customers to this company is part of an agreement that involves a cost. The organization needs to ensure that its customers are sending all the information as agreed upon though the agreement.
  6. Duplicate information
  7. Stale data

The process supporting the needs of the business was near real time and executed in intervals of an hour. The business wanted these alerts every one hour so that appropriate action could be taken. The ability to view all the required information directly in their emails without having to open attachments and dig through CSV data was a big time saver and value-add for the client.

Impact

  1. The ability to view the required data directly in an email through a structured HTML table was a big value-add to the business which until then was dealing with csv attachments.
  2. The business started to quote the alert emails verbatim for all discussions amongst the senior management
  3. The accelerator saved 100 man hours of effort by because of its reusable nature. It ensured that re-work and re-design to the ETL flow did not cause any impact to the way alert notifications were being generated.

If you would be interested in hearing more about the framework and wish to implement this framework, feel free to connect with me and I’ll be happy to help.

Posted in TechnologyTags:
Write a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Be Original
Would the boy you were be proud of the man you are?