Codebasics Logo

Powered by
Codebasics DA Bootcamp

Home

About

Projects

My Experience

Certificates

Contact

Gayaz Ahmed Shaik

Gayaz Ahmed Shaik

Data Engineer/Analyst

[email protected] +1 5089623242
Resume

Hello, I am

Gayaz Ahmed Shaik.

1

Excel
Project

1

PowerBI
Project

1

Python
Project

About Me

Data Engineer / Analyst with hands-on experience turning messy data into production ML models, stable ELT pipelines, and clear business insights. Skilled in Python, SQL, and PySpark; Snowflake and Databricks for scalable compute; AWS (S3, Lambda, SageMaker) for deployment; and Airflow + dbt for orchestration. Experienced in feature engineering, causal analysis, model tracking with MLflow/Optuna, drift monitoring (PSI), automated retraining, and data-quality gating. Proficient in Tableau and Power BI for storytelling and in using SQL for deep root-cause analysis. Completed a Master’s in Data Analytics, combining statistical rigor with engineering discipline to ship reliable, high-impact data products.

Key Skills

Power BI

Tableau

SQL

Python

AWS

FastDataMasker

MongoDB

Scala

Excel

Power query

Azure Devops

Tricentis Tosca

My Projects

Sales Pulse: Excel-Powered Customer & Market Insights
Sales Pulse: Excel-Powered Customer & Market Insights

Domain/Function: Sales & Marketing Analytics

Enterprise-Wide Business Intelligence Dashboard (Business Insights 360)
Enterprise-Wide Business Intelligence Dashboard (Business Insights 360)

Domain/Function: Business Intelligence & Analytics

AdventureWorks 360° Insights Dashboard (Tableau)
AdventureWorks 360° Insights Dashboard (Tableau)

Domain/Function: Sales, Customer & Product Performance Insights

CardioRisk Analyzer – Heart-Disease Prediction & Clinical Insights (Python/ML)
CardioRisk Analyzer – Heart-Disease Prediction & Clinical Insights (Python/ML)

Domain/Function: Heart-Disease Risk Prediction & Clinical Decision Support

My Experience

Deta Engineer/Analyst – JPMorgan Chase | MA                                                                                        May 2024 – Present

       Developed and automated 10+ interactive dashboards using Microsoft Power BI and Tableau, increasing executive reporting efficiency by 40% and accelerating data-driven decision-making.

       Migrated 380 TB from a 250-node Hadoop cluster to an AWS S3 Lakehouse on Apache Iceberg.

        Added Standard-IA/Glacier tiering and FinOps tags, cutting storage and compute costs 35% and shrinking daily batch windows from 8 hours to 90 minutes.

       Built a real time ingestion pipeline with Kafka 3.6, Apache Flink 1.18, and Kinesis Firehose that streams 1.2 billion events a day (≈ 80 k msg/s). Reduced data latency from next day to under 5 minutes, enabling intraday risk updates.

       Created data contracts in JSON-Schema and Protobuf and enforced row- and column-level security with AWS Lake Formation and Apache Ranger, achieving 100% compliance with SEC 17a-4, SOX, and GDPR across 12 business units.

       Added Monte Carlo Data Observability and OpenLineage monitoring to Airflow pipelines, cutting data-downtime incidents by 70% (52 → 16 per quarter) and reducing mean time to recover from 4 hours to 45 minutes.

       Deployed Snowflake Secure Data Sharing with Terraform and GitHub Actions, onboarding 320 analysts and eliminating more than 250 manual CSV requests each quarter.

       Tuned Spark 3.5 jobs on EMR (broadcast joins, AQE, Z-ORDER, Bloom filters, Iceberg MERGE) to cut portfolio risk run times 8× (95 s → 12 s) and lower cluster costs 28% with autoscaling and spot instances.

 

Data Engineer – Accenture | Hyderabad, India                                                                               July 2021 – August 2023

       Built automated pipelines that moved CRM, ERP, and third-party API data into a SQL Server staging area, transformed the data in Azure Data Factory, and stored clean Parquet files in the Lakehouse eliminating 90% of manual file drops and unlocking near real time analytics.

       Refactored T SQL jobs that handle 50 M rows per week by rewriting joins, adding window functions, and indexing key columns; cut dashboard load times from 3 + minutes to under 20 seconds.

       Implemented nightly PII masking for 10 production clones with CA Test Data Manager and Fast Data Masker, shortening data-provisioning SLAs from 48 hours to same day and accelerating regression testing.

       Embedded Great Expectations checks in every SQL script, raising first-pass QA approval to 99% and reducing post-release defects by 30%.

       Engineered Azure Databricks PySpark frameworks and reusable Python utilities that ingest 60 M events/day, auto-handle schema drift, and materialize Delta Lake tables cutting ML feature-set prep time 65% and adding Python/PySpark scalability to the team's stack.

       Launched a CI/CD pipeline in Azure DevOps that unit-tests SQL objects and dbt models before one-click promotion to QA, UAT, and Prod, decreasing deployment errors by 30%.

       Created Excel dashboards with Power Query and VBA automations that save 8 analyst hours each week and improve forecast accuracy by 12%.

       Collaborated with cross-functional stakeholders and mentored junior analysts to accelerate data pipeline adoption across Finance, Marketing, and DevOps teams.

       Partnered with product owners in bi-weekly agile ceremonies to prioritize backlog items, cutting rework cycles by 20 %.

Associate Data Analyst – Accenture | Bangalore, India                                                                           July 2020 – Jun 2021

       Converted ambiguous business asks into ML/statistical problem specs; defined targets, leakage checks, and evaluation metrics (AUC, RMSE, silhouette) in versioned design docs.

       Built model‑ready feature tables with SQL CTE chains and window functions; cut prep time from ~3 hours to 25 minutes and eliminated duplicate transformations via shared Git repos.

       Embedded data‑quality gates (schema drift, null thresholds) using Great Expectations + pytest inside Airflow DAGs, blocking bad loads before training.

       Produced Tableau/Matplotlib packets—calibration plots, SHAP ranks, cohort funnels—accelerating stakeholder decisions on campaign tweaks from days to hours.


Awards & Certificate

Google Professional Data Analytics

Google Professional Data Analytics

AWS Certiied Data Engineer - Associate

AWS Certiied Data Engineer - Associate

PL-300 - Microsoft Power BI Data Analyst

PL-300 - Microsoft Power BI Data Analyst

Tableau Desktop Specialist

Tableau Desktop Specialist

Let's Connect

Feel free to get in touch with me. I am always open to discussing new projects, creative ideas or opportunities to be part of your visions.