Member-only story
Building a data warehouse using BigQuery [Part 3]. ETL pipeline with load monitoring and error handling.
Building ETL pipeline with load monitoring and error handling.
Welcome to part 3 of the tutorial series “Build a Data warehouse in the Cloud using BigQuery”. This tutorial, demonstrates how to build a real-time (or close to real-time) analytics ETL pipeline using Cloud Functions. In Part 2 we wrote a Cloud function to load data from Google Storage into BigQuery.
Our Cloud function was built on top of the hybrid solution that we completed in Part 1.
In Part 1 we created a streaming lambda to transfer our data files from AWS S3 to Google Storage. In Part2 we wrote a Cloud function to load different types of data in BigQuery. We will use it as a foundation for our ETL pipeline which will be responsible for loading, data monitoring and error handling.
Before trying this article, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries .
Project layout:
Part 1: Sync only new/changed files from AWS S3 to GCP cloud storage.
Part 2: Loading data into BigQuery. Create tables using schemas. We’ll use yaml to create config file.
You are here >>> Part 3: Create…