Member-only story

Building a Data warehouse in the Cloud using BigQuery [Part 1]. Sync your AWS S3 data with Google Storage in 20 minutes.

đź’ˇMike Shakhomirov
5 min readOct 12, 2019

--

Photo by Daniel Cheung on Unsplash

Sync your AWS S3 data with Google Storage in 20 minutes.

Have you heard about “Big Data” analytics, ETL, Data Warehouses and been confused about how to use it?

Have you ever wondered how “AWS S3” might be synchronizing your files with “Google Cloud Storage” in the background?

In this 20 minute tutorial, we’ll walk through building a Node.js Lambda function in AWS to transfer data between accounts. We will create a Lambda function which listens for new files added to the source S3 bucket and then triggers a process which copies them to Google Storage.

Part 2 of this series provides a tutorial on building a Data Warehouse In the Cloud using Google BigQuery. It is a step-by-step guide to building a Big Data Warehouse solution including load monitoring and error handling.

Project layout:

You are here >>> Part 1: Sync only new/changed files from AWS S3 to GCP cloud storage.

Part 2: Loading data into BigQuery. Create tables using schemas. We’ll use yaml to create config file.

Part 3: Create streaming ETL pipeline with load monitoring and error handling.

--

--

đź’ˇMike Shakhomirov
đź’ˇMike Shakhomirov

Written by đź’ˇMike Shakhomirov

Data Engineer, Data Strategy and Decision Advisor, Keynote Speaker | linktr.ee/mshakhomirov | @MShakhomirov

Responses (1)