Hey there,

I’m Hari and I love playing with my favorite toy, Data.

I’m a Data Engineer with around 4 years of experience in the field of data science. I am absolutely fascinated by data science and the potential benefit it offers to the world



I've always viewed data as they are made of LEGO blocks. If I were to explain data science using LEGO blocks as an example, I could say that data science is all about taking a bunch of data (represented by LEGO blocks) and turning it into something useful (building a LEGO structure).

Let me explain, doing any data related project, I would need to first collect a bunch of data (LEGO blocks). Next, I would need to analyze the data to find patterns and relationships. Finally, I would need to use this information to build something useful (a LEGO structure).

Data science is all about turning data into something useful. Just like you can use LEGO blocks to build all sorts of different structures, you can use data to build all sorts of different things.


A bit more about myself, I’m someone who loves to take something and try improving it for the better because, why not? It’s fun and this is something I try to apply to all aspects of my life. I’m a huge health nut who constantly tries new diets and new gym routines all the time. I love listening to new podcasts, I always try to find 1 thing that I can immediately apply to my life, and I usually notice that 1 single change will actually carry over to different parts of my life. I use this same philosophy of cross-learning approach in my professional life.

Skills

I love architecting and improving data systems. That is why I try to learn something new whenever I get the opportunity, be it the basics of a new programming language, a leadership seminar, or a new orchestration tool that hit the market. I believe that these little things add up and shape the way I think over time.

Here is an overview of my professional skills.

  • SQL
  • Azure cloud services
  • Python
  • Amazon Web Services
  • Scala
  • Google Cloud Platform
  • NO SQL (Mongo DB, Couch DB)
  • Airflow, Kafka, Hive

My Personal Projects

Here are a few of my personal projects that I've been working on.

Azure -Twitter sentiment analysis with Stream Prcossing

- This an end to end project where we pull live tweets from twitter with help Spark clusters on Azure Data Bricks and perform sentiment analysis on the data with Azure Language Services.

- Then we store the final Data in a Azure DB and a Parquet file in ADS Gen 2 Data Lake as a backup, this is orchestrated with the help of azure Data Factory.

- We finally use Power BI to connect with the Final DB in Azure SQL DB. Note: Azure Synapse can be utilized instead of Azure DB.

- In this project I have retrieved tweets regarding Marvel since the new Ms Marvel movie was just released, the final Power BI will show how much people feel about the movie in twitter.

- Click on the image to go to my GitHub

AWS - Youtube Analytics

- This an end to end project where we do some basic transformtion and data processing on youtube analytics Data from the youtube API.

- The data is stored in an initial landing bucket which is then transformed with the help of Amazon Glue and Amazon Lambda.

- Finally we use a Glue job to store our data in a analytics DB in Amazon Athena.

- This DB is connected with Tableau and we are able to do some data exploration and identify key metrics with Tableau Visualization.

- Click on the image to go to my GitHub

GCP - Covid prediction

- A prediction model built on data Toronto Public health.

- We use the GCP data proc to create a cluster where we load the data.

- The project aims at creating a predictive model that can predict the number of patients who are going to be admitted in the ICU while affected with covid.

- The predictive model will be developed with the help of Apache Spark. The algorithm used for this predictive model will be the Random Tree Classifier algorithm.

- The final result is moved to Google Cloud Storage, This then Loaded to A tableau workbook where further exploration is done.

AQI Dashboard - Tableau

- A Dashboard model built on data AQI data of the USA, sourced from Kaggle.

- We use Jupyter notebook to pre process the data.

- The data is then imported to Tableau.

- The Tableau Dashboard allows the user to understand several metrics and their trends over the years.

- The dashboard also allows the user to view each states trends over the years as well as the AQI performance of USA.

Store sales - Tableau

- A Dashboard model built on sample store sales, sourced from Kaggle.

- We use Jupyter notebook to pre process the data.

- The data is then imported to Tableau.

- The Tableau Dashboard allows the user to understand several sales metric of the store over 3 months.

- The dashboard also allows the user to view each branch and month wise trends over the 3 months for trends like customer rating, most sold category and average foot traffic throughout the day.

Los Angeles Government Payroll - Python

- We use Jupyter notebook to pre process the data.

- Initial data cleansing is done with supplementary research about Los Angeles wage laws.

- The jupyter notebook allows the user to understand several gender and racial insights of the state over 9 years.