Tag: Spark
Spark Config Calculator
The Spark Configuration Tool is a Streamlit-based application designed to assist users in optimizing Apache Spark configurations. It allows users…
Spark Architecture – Notes & FAQ
I’ve spent the past couple of weeks trying to master the Spark Architecture and this post is a running summary of all my notes and questions gathered from across the internet & Stackoverflow. If you feel something is incorrect, I’ll be happy to discuss. Hope you find it useful!
The All New AWS Glue Studio
Up until now, AWS provided a visual representation of your code but never really allowed you to build using a…
Real Time Data Streaming Into Kinesis & Ingestion Into Postgres Using AWS Glue – Part 2 (Configure Glue Catalog Tables)
Before we start building Glue jobs, we need to understand that one of the unique features of Glue is its…
Real Time Data Streaming Into Kinesis & Ingestion Into Postgres Using AWS Glue – Part 1 (Setup)
This April, Amazon announced support for serverless streaming ETL using AWS Glue. For the uninformed – AWS Glue is built…
AWS – Develop ETL jobs using AWS Glue Endpoints
AWS Glue scripts can have start up times that could be as long as 12 minutes especially if you are…