Introduction

A few days ago I wrote 1st part of this blog. Where I talked about the concept that how can we design serverless data ingestion from API data either as streaming or batch pipeline. In this blog, we will see the code and configuration for the entire pipeline.

Let's get started.

Design

In the previous blog, we discuss pattern 1 where we are ingesting data from API and Inserting it into BigQuery In real-time. We will use this pattern for showing code and configuration.

Cloud Workflow Yaml Configuration

  • Cloud workflow is a serverless offering from GCP that allows us to design workflow that can execute…

Javascript UDF’s are cool and using with NPM library is a whole new world to explore!

Background

One of the main reason to build ETL pipeline was to do data transformation on data before loading into the data warehouse. The only reason we were doing that because data warehouses were not capable to handle these data transformations due to several reasons such as performance and flexibility.

In the era of modern data warehouses like Google BigQuery or SnowFlake, things have changed. These data warehouses can process terabyte and petabyte data within seconds and minutes. Considering this much improvement, now performing data transformation within a data warehouse make more sense. Hence to create common transformation logic via UDF


Restrict SQL query results by defining row level policy & return subsets of data 🚀

Introduction

BigQuery released Row level security feature to provide granular access controls. In this blog, we will use this feature to protect our query results based on certain conditions. Let's get started!

BigQuery Data Source

Our BigQuery dataset consists of top headlines information that I have collected from News API.
We will use BigQuery’s new feature called Row Level Security Policy which will allow analysts to query data based on the assigned news source. For example analyst A will only get news data from “CNN” while analyst B will get data from source “CNBC”.


BigQuery ML is democratizing Machine Learning for Data analysts 🚀

Background

Google BigQuery supports running ML models using SQL queries which basically bridges the gap for data analysts and data scientists. As a data analyst, you don't have to learn python, R, or yet another popular ML framework or library.

A basic understanding of ML discipline is enough and with the help of SQL, data analysts can enter into the complex-looking fancy world of Machine Learning.

BigQuery ML supports various types of ML models such as :

  • Linear Regression Binary
  • Logistic Regression
  • Multiclass Logistic Regression
  • K-means clustering and many more.

In this blog, we will build a binary classification model using…


Ingesting API Data in Google BigQuery the Serverless way!

API To Google BigQuery

In the era of cloud computing, Serverless has become a buzzword that we keep hearing about, And eventually, we get convinced that serverless is the way to go for companies of all sizes because of various advantages. The basic advantages of the Serverless approach are :

  • No Server Management
  • Scalability
  • Pay as you go

In this article, we will also explore how we can use the Serverless approach to build our data Ingestion pipeline in Google Cloud.

Serverless Offerings In GCP

GCP offers plenty of Serverless Services in various areas such as mentioned below.

  • Computing: Cloud Run, Cloud Function, App Engine
  • Data warehouse: Google…


“Building streaming pipeline on GCP is very simple with google’s technology”

Table Of Contents

  • Step 1: Building Client
  • Step 2: Data Ingestion
  • Step 3: Data Processing
  • Step 4: Data Sink

Here is the video if you prefer one, otherwise continue to the blog.

Step 1: Building Client

  • Client for Pubsub can be any data producer like mobile app events, user behavior touchpoints, database changes (change data capture), sensor data, etc.
  • Cloud PubSub provides many client libraries such as Java, C#, C++, Go, Python, Ruby, PHP & REST API.
  • One can easily integrate client code into data producer to publish the data into Cloud PubSub

Here is the example of Java Client Code

  • This…


Text Block, Sealed Classes, ZGC & Shenandoah Garbage Collector, and many more!

Table of Contents

  • New Features
  • Preview Features
  • JVM Improvements
  • Deprecations and Removals

By following its 6 — month release cycle, Java 15 becomes the 2nd release of 2020. Every 3-year LTS version will be released, and the next LTS will be Java 17 in 2021. And you can see that release was consistent with the planned schedule.

New Features

1: Text Blocks

  • Text block is a multi-line string literal that avoids the need for most escape sequences, automatically formats the string in a predictable way, and gives the developer control over format when…


I passed Google Cloud Professional Data Engineer exam in first attempt on 27th September.Let me share my preparation with you all for your reference.

Please subscribe to my youtube channel for tech related videos.

Table of Contents

  • My Previous Experience with GCP
  • Current work with GCP
  • My Preparation For Exam
  • Practice Test
  • One important Tip

If you prefer video , here i have made one for this blog :

My Previous Experience with GCP

I started my GCP journey in 2018 when I was working on Kafka and Kubernetes. I was so happy to see my first distributed software installed and communicating with each other. …


If you have petabyte scale data and want write/read performance in millisecond, then BigTable is the option on GCP. In this Blog we will explore BigTable in nutshell.

Please subscribe to my youtube channel for tech related videos.

Table of Contents

  • Introduction
  • Advantages over open source HBase
  • Storage Model
  • Architecture
  • Schema Design
  • How to Connect
  • Use Cases
  • When not use bigtable

Let get started.

Introduction

  • NoSQL database ( Hence different than Cloud SQL, Cloud Spanner and BigQuery).
  • Used for OLAP applications( Like BigQuery) which makes is it different than OLTP databases( Cloud SQL and Cloud Spanner).
  • For very large & sparse dataset .
  • Very…

Suraj Mishra

Backend Engineer by profession. Google Cloud Certified Professional Data Engineer. I share my tech experience on Youtube and Medium.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store