Open in app

Sign In

Write

Sign In

💡Mike Shakhomirov
💡Mike Shakhomirov

2.4K Followers

Home

About

Published in

Towards Data Science

·Pinned

Advanced SQL techniques for beginners

On a scale from 1 to 10 how good are your data warehousing skills? — Github On a scale from 1 to 10 how good are your data warehousing skills? Want to go above 7/10? This article is for you. Want to get ready for a data analyst job interview asap? This blog post explains some intricate data warehouse SQL techniques in detail. …

Sql

7 min read

Advanced SQL techniques for beginners
Advanced SQL techniques for beginners
Sql

7 min read


Published in

Towards Data Science

·Pinned

Unit Tests for SQL Scripts with Dependencies in Dataform

and Data warehouse Gitflow pipelines to run it automatically — Do you unit test your data warehouse scripts? I am going to talk about unit tests for complex SQL queries which might consist of multiple operations (actions). I tried it before using BigQuery scripting: SQL Unit Testing in BigQuery? Here is a tutorial. Complete guide for scripting and UDF testing.towardsdatascience.com However, here is another (and probably better) way to do it.

Sql

6 min read

Unit Tests for SQL Scripts with Dependencies in Dataform
Unit Tests for SQL Scripts with Dependencies in Dataform
Sql

6 min read


Published in

Towards Data Science

·Pinned

Training your ML model using Google AI Platform and Custom Environment containers

Complete guide using Tensorflow, Airflow scheduler and Docker — Google AI Platform allows advanced model training using various environments. So it is really easy to train your model with just one command like so: gcloud ai-platform jobs submit training ${JOB_NAME} \ --region $REGION \ --scale-tier=CUSTOM \ --job-dir ${BUCKET}/jobs/${JOB_NAME} \ --module-name trainer.task \ --package-path trainer \ --config trainer/config/config_train.json …

AI

7 min read

Training your ML model using Google AI Platform and Custom Environment containers
Training your ML model using Google AI Platform and Custom Environment containers
AI

7 min read


Published in

Towards Data Science

·Pinned

Retention and Daily Active Users Explained.

Complete Data Studio guide and BigQuery tutorial for Firebase users, Machine Learning enthusiasts and Marketers. All you wanted to know. Data Studio template included. — Have you ever wondered how to reduce user churn and save money spent on user acquisition? This article is about how to count those users who stay in your App in order to understand what makes them stay. Who is this article for? Marketers who have been tasked to create custom user retention dashboards.

Firebase

11 min read

Retention and Daily Active Users Explained.
Retention and Daily Active Users Explained.
Firebase

11 min read


Published in

Towards Data Science

·Apr 14

Continuous Integration and Deployment for Data Platforms

CI/CD for data engineers and ML Ops — What is a data environment? Data engineers split infrastructure resources into live and staging to create isolated areas (environments) where they can test ETL services and data pipelines before promoting them to production. Data environment refers to a set of applications and related physical infrastructure resources that enable data storage…

Big Data

9 min read

Continuous Integration and Deployment for Data Platforms
Continuous Integration and Deployment for Data Platforms
Big Data

9 min read


Published in

Geek Culture

·Apr 13

Mastering AWS CLI

Magic Tricks to Work with Logging, AWS Lambda and Much More — In this story I will write about a few AWS CLI (Command Line Interface) tricks which saved me a lot of time and made my work more efficient. Anything we can do on the AWS website is also possible via the command line. It helps with documentation while writing `readme`…

AWS

8 min read

Mastering AWS CLI
Mastering AWS CLI
AWS

8 min read


Published in

Towards Data Science

·Apr 10

Introduction to Apache Iceberg Tables

A few Compelling Reasons to Choose Apache Iceberg for Data Lakes — Apache Iceberg: What is it? Apache Iceberg — is it a new data lake file format? A table format? Why is it so good? Time travel for data lakes? I will try to answer all these questions in this story. Transactionally consistent data lake tables with point-in-time snapshot isolation is…

Data Engineering

8 min read

Introduction to Apache Iceberg Tables
Introduction to Apache Iceberg Tables
Data Engineering

8 min read


Published in

Towards Data Science

·Apr 3

Data Pipeline Orchestration

Data pipeline management done right simplifies deployment and increases the availability and accessibility of data for analytics — What is data orchestration? DataOps teams use Data Pipeline Orchestration as a solution to centralize administration and oversight of end-to-end data pipelines. The process of automating the data pipeline is known as data orchestration. It is important to manage data pipelines right as it affects almost everything, i.e. …

Data Engineering

7 min read

Data Pipeline Orchestration
Data Pipeline Orchestration
Data Engineering

7 min read


Published in

Towards Data Science

·Mar 20

Create MySQL and Postgres instances using AWS Cloudformation

Infrastructure as Code for database practitioners — With these two simple Cloudformation templates, we will learn how to deploy Postgres and MySQL Aurora database instances with just one command CLI command. I have optimised template files reducing the number of properties to a minimum so it’s easier to comprehend. I know that Infrastructure as Code might…

Data Engineering

7 min read

Create MySQL and Postgres instances using AWS Cloudformation
Create MySQL and Postgres instances using AWS Cloudformation
Data Engineering

7 min read


Published in

Towards Data Science

·Feb 28

Big Data File Formats, Explained

Parquet vs ORC vs AVRO vs JSON. Which one to choose and how to use them? — I’m a big fan of data warehouse (DWH) solutions with ELT-designed (Extract-Load-Transform) data pipelines. However, at some point, I faced the requirement to process raw event data in Cloud Storage and had to choose the file format for data files. This is a typical scenario when machine learning engineers are…

Big Data

9 min read

Big Data File Formats, Explained
Big Data File Formats, Explained
Big Data

9 min read

💡Mike Shakhomirov

💡Mike Shakhomirov

2.4K Followers

BigData Engineer | Full stack dev | I write about ML/AI in Digital marketing. | linktr.ee/mshakhomirov | @MShakhomirov

Following
  • Utkarsha Bakshi

    Utkarsha Bakshi

  • William Spivey

    William Spivey

  • Tiffany Patterson

    Tiffany Patterson

  • Jayanga Jayathilake

    Jayanga Jayathilake

  • Bennett Garner

    Bennett Garner

See all (7,034)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech