Member-only story

Building a data warehouse using BigQuery [Part 3]. ETL pipeline with load monitoring and error handling.

đź’ˇMike Shakhomirov
10 min readOct 27, 2019

--

Building ETL pipeline with load monitoring and error handling.

Welcome to part 3 of the tutorial series “Build a Data warehouse in the Cloud using BigQuery”. This tutorial, demonstrates how to build a real-time (or close to real-time) analytics ETL pipeline using Cloud Functions. In Part 2 we wrote a Cloud function to load data from Google Storage into BigQuery.

Photo by Suzanne D. Williams on Unsplash

Our Cloud function was built on top of the hybrid solution that we completed in Part 1.

In Part 1 we created a streaming lambda to transfer our data files from AWS S3 to Google Storage. In Part2 we wrote a Cloud function to load different types of data in BigQuery. We will use it as a foundation for our ETL pipeline which will be responsible for loading, data monitoring and error handling.

Before trying this article, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries .

Project layout:

Part 1: Sync only new/changed files from AWS S3 to GCP cloud storage.

Part 2: Loading data into BigQuery. Create tables using schemas. We’ll use yaml to create config file.

You are here >>> Part 3: Create…

--

--

đź’ˇMike Shakhomirov
đź’ˇMike Shakhomirov

Written by đź’ˇMike Shakhomirov

Data Engineer, Data Strategy and Decision Advisor, Keynote Speaker | linktr.ee/mshakhomirov | @MShakhomirov

No responses yet