Open in app

Sign In

Write

Sign In

💡Mike Shakhomirov
💡Mike Shakhomirov

1.8K Followers

Home

About

Published in Towards Data Science

·Pinned

Advanced SQL techniques for beginners

On a scale from 1 to 10 how good are your data warehousing skills? — Github On a scale from 1 to 10 how good are your data warehousing skills? Want to go above 7/10? This article is for you. Want to get ready for a data analyst job interview asap? This blog post explains some intricate data warehouse SQL techniques in detail. …

Sql

7 min read

Advanced SQL techniques for beginners
Advanced SQL techniques for beginners
Sql

7 min read


Published in Towards Data Science

·Pinned

Unit Tests for SQL Scripts with Dependencies in Dataform

and Data warehouse Gitflow pipelines to run it automatically — Do you unit test your data warehouse scripts? I am going to talk about unit tests for complex SQL queries which might consist of multiple operations (actions). I tried it before using BigQuery scripting: SQL Unit Testing in BigQuery? Here is a tutorial. Complete guide for scripting and UDF testing.towardsdatascience.com However, here is another (and probably better) way to do it.

Sql

6 min read

Unit Tests for SQL Scripts with Dependencies in Dataform
Unit Tests for SQL Scripts with Dependencies in Dataform
Sql

6 min read


Published in Towards Data Science

·Pinned

Training your ML model using Google AI Platform and Custom Environment containers

Complete guide using Tensorflow, Airflow scheduler and Docker — Google AI Platform allows advanced model training using various environments. So it is really easy to train your model with just one command like so: gcloud ai-platform jobs submit training ${JOB_NAME} \ --region $REGION \ --scale-tier=CUSTOM \ --job-dir ${BUCKET}/jobs/${JOB_NAME} \ --module-name trainer.task \ --package-path trainer \ --config trainer/config/config_train.json …

AI

7 min read

Training your ML model using Google AI Platform and Custom Environment containers
Training your ML model using Google AI Platform and Custom Environment containers
AI

7 min read


Published in Towards Data Science

·Pinned

Retention and Daily Active Users Explained.

Complete Data Studio guide and BigQuery tutorial for Firebase users, Machine Learning enthusiasts and Marketers. All you wanted to know. Data Studio template included. — Have you ever wondered how to reduce user churn and save money spent on user acquisition? This article is about how to count those users who stay in your App in order to understand what makes them stay. Who is this article for? Marketers who have been tasked to create custom user retention dashboards.

Firebase

11 min read

Retention and Daily Active Users Explained.
Retention and Daily Active Users Explained.
Firebase

11 min read


Published in Towards Data Science

·Just now

Create MySQL and Postgres instances using AWS Cloudformation

Infrastructure as Code for database practitioners — With these two simple Cloudformation templates, we will learn how to deploy Postgres and MySQL Aurora database instances with just one command CLI command. I have optimised template files reducing the number of properties to a minimum so it’s easier to comprehend. I know that Infrastructure as Code might…

Data Engineering

7 min read

Create MySQL and Postgres instances using AWS Cloudformation
Create MySQL and Postgres instances using AWS Cloudformation
Data Engineering

7 min read


Published in Towards Data Science

·Feb 28

Big Data File Formats, Explained

Parquet vs ORC vs AVRO vs JSON. Which one to choose and how to use them? — I’m a big fan of data warehouse (DWH) solutions with ELT-designed (Extract-Load-Transform) data pipelines. However, at some point, I faced the requirement to process raw event data in Cloud Storage and had to choose the file format for data files. This is a typical scenario when machine learning engineers are…

Big Data

9 min read

Big Data File Formats, Explained
Big Data File Formats, Explained
Big Data

9 min read


Published in Towards Data Science

·Feb 22

Test Data Pipelines the Fun and Easy Way

Beginners guide: Why unit and integration tests are so important for your data platform — This story is for those who would like to learn how to code and how to run tests, automate CI/CD checks and run them in any environment including locally. Unit testing is an essential must-have skill for machine learning engineers these days. …

Big Data

10 min read

Test Data Pipelines the Fun and Easy Way
Test Data Pipelines the Fun and Easy Way
Big Data

10 min read


Published in Towards Data Science

·Feb 20

Data Platform Architecture Types

How well does it answer your business needs? Dilemma of a choice. — It is easy to get lost with the abundance of data tools available in the market right now. The Internet is filled with opinionated stories (often speculative) about which data tools to use and how to make our Data Stack modern in this particular year. Which data tools are the…

Big Data

9 min read

Data Platform Architecture Types
Data Platform Architecture Types
Big Data

9 min read


Published in Towards AI

·Feb 7

Supercharge Your Data Engineering Skills with This Machine Learning Pipeline

Data modeling, Python, DAGs, Big Data file formats, costs… It covers everything — This is a real-life scenario when I was tasked to create a highly scalable machine learning pipeline with raw event data sent from the mobile application. The story offers a set of advanced techniques that might be useful for interview preparation. Learn how to work with raw data, transform it…

Big Data

12 min read

Supercharge Your Data Engineering Skills with This Machine Learning Pipeline
Supercharge Your Data Engineering Skills with This Machine Learning Pipeline
Big Data

12 min read


Published in Level Up Coding

·Feb 7

How to Deal with Wildcard Tables in BigQuery

A couple of tricks to speed up Your Data Warehousing — Wildcard tables in BigQuery are considered somewhat legacy but still in use by many services and third-party applications. I will use a small public dataset to show a few tricks In BigQuery scripting table names can’t be used as parameters, at least for now. It makes it extremely difficult to…

Data Warehouse

4 min read

How to Deal with Wildcard Tables in BigQuery
How to Deal with Wildcard Tables in BigQuery
Data Warehouse

4 min read

💡Mike Shakhomirov

💡Mike Shakhomirov

1.8K Followers

BigData Engineer | Full stack dev | I write about ML/AI in Digital marketing. | linktr.ee/mshakhomirov | @MShakhomirov

Following
  • Niharikaa Kaur Sodhi

    Niharikaa Kaur Sodhi

  • Towards AI Editorial Team

    Towards AI Editorial Team

  • Gun Roswell

    Gun Roswell

  • Jesus Rodriguez

    Jesus Rodriguez

  • Aaron Dinin, PhD

    Aaron Dinin, PhD

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech