Tag: ETL
August 22, 2023
/ Technology
Understanding Driver Pools in Dataproc
Let’s learn about driver pools in Dataproc – an important concept to understand while using multi-tenant Dataproc clusters
April 21, 2023
/ Technology
Hadoop — Understanding Splits, Blocks & Everything In Between
Originally published at my Medium Blog Understanding Hadoop is like trying to unravel a tangled ball of yarn while wearing…
October 7, 2020
/ Technology
The All New AWS Glue Studio
Up until now, AWS provided a visual representation of your code but never really allowed you to build using a…
October 6, 2020
/ Technology
Real Time Data Streaming Into Kinesis & Ingestion Into Postgres Using AWS Glue – Part 2 (Configure Glue Catalog Tables)
Before we start building Glue jobs, we need to understand that one of the unique features of Glue is its…
October 5, 2020
/ Technology
Real Time Data Streaming Into Kinesis & Ingestion Into Postgres Using AWS Glue – Part 1 (Setup)
This April, Amazon announced support for serverless streaming ETL using AWS Glue. For the uninformed – AWS Glue is built…
October 3, 2020
/ Technology
AWS – Develop ETL jobs using AWS Glue Endpoints
AWS Glue scripts can have start up times that could be as long as 12 minutes especially if you are…