Portfolio - Gayaz Ahmed Shaik

About Me

Data Engineer / Analyst with hands-on experience turning messy data into production ML models, stable ELT pipelines, and clear business insights. Skilled in Python, SQL, and PySpark; Snowflake and Databricks for scalable compute; AWS (S3, Lambda, SageMaker) for deployment; and Airflow + dbt for orchestration. Experienced in feature engineering, causal analysis, model tracking with MLflow/Optuna, drift monitoring (PSI), automated retraining, and data-quality gating. Proficient in Tableau and Power BI for storytelling and in using SQL for deep root-cause analysis. Completed a Master’s in Data Analytics, combining statistical rigor with engineering discipline to ship reliable, high-impact data products.

Key Skills

Power BI

Tableau

SQL

Python

AWS

FastDataMasker

MongoDB

Scala

Excel

Power query

Azure Devops

Tricentis Tosca

My Experience

Deta Engineer/Analyst – JPMorgan Chase | MA May 2024 – Present

• Developed and automated 10+ interactive dashboards using Microsoft Power BI and Tableau, increasing executive reporting efficiency by 40% and accelerating data-driven decision-making.

• Migrated 380 TB from a 250-node Hadoop cluster to an AWS S3 Lakehouse on Apache Iceberg.

• Added Standard-IA/Glacier tiering and FinOps tags, cutting storage and compute costs 35% and shrinking daily batch windows from 8 hours to 90 minutes.

• Built a real time ingestion pipeline with Kafka 3.6, Apache Flink 1.18, and Kinesis Firehose that streams 1.2 billion events a day (≈ 80 k msg/s). Reduced data latency from next day to under 5 minutes, enabling intraday risk updates.

• Created data contracts in JSON-Schema and Protobuf and enforced row- and column-level security with AWS Lake Formation and Apache Ranger, achieving 100% compliance with SEC 17a-4, SOX, and GDPR across 12 business units.

• Added Monte Carlo Data Observability and OpenLineage monitoring to Airflow pipelines, cutting data-downtime incidents by 70% (52 → 16 per quarter) and reducing mean time to recover from 4 hours to 45 minutes.

• Deployed Snowflake Secure Data Sharing with Terraform and GitHub Actions, onboarding 320 analysts and eliminating more than 250 manual CSV requests each quarter.

• Tuned Spark 3.5 jobs on EMR (broadcast joins, AQE, Z-ORDER, Bloom filters, Iceberg MERGE) to cut portfolio risk run times 8× (95 s → 12 s) and lower cluster costs 28% with autoscaling and spot instances.

Data Engineer – Accenture | Hyderabad, India July 2021 – August 2023

• Built automated pipelines that moved CRM, ERP, and third-party API data into a SQL Server staging area, transformed the data in Azure Data Factory, and stored clean Parquet files in the Lakehouse eliminating 90% of manual file drops and unlocking near real time analytics.

• Refactored T SQL jobs that handle 50 M rows per week by rewriting joins, adding window functions, and indexing key columns; cut dashboard load times from 3 + minutes to under 20 seconds.

• Implemented nightly PII masking for 10 production clones with CA Test Data Manager and Fast Data Masker, shortening data-provisioning SLAs from 48 hours to same day and accelerating regression testing.

• Embedded Great Expectations checks in every SQL script, raising first-pass QA approval to 99% and reducing post-release defects by 30%.

• Engineered Azure Databricks PySpark frameworks and reusable Python utilities that ingest 60 M events/day, auto-handle schema drift, and materialize Delta Lake tables cutting ML feature-set prep time 65% and adding Python/PySpark scalability to the team's stack.

• Launched a CI/CD pipeline in Azure DevOps that unit-tests SQL objects and dbt models before one-click promotion to QA, UAT, and Prod, decreasing deployment errors by 30%.

• Created Excel dashboards with Power Query and VBA automations that save 8 analyst hours each week and improve forecast accuracy by 12%.

• Collaborated with cross-functional stakeholders and mentored junior analysts to accelerate data pipeline adoption across Finance, Marketing, and DevOps teams.

• Partnered with product owners in bi-weekly agile ceremonies to prioritize backlog items, cutting rework cycles by 20 %.

Associate Data Analyst – Accenture | Bangalore, India July 2020 – Jun 2021

• Converted ambiguous business asks into ML/statistical problem specs; defined targets, leakage checks, and evaluation metrics (AUC, RMSE, silhouette) in versioned design docs.

• Built model‑ready feature tables with SQL CTE chains and window functions; cut prep time from ~3 hours to 25 minutes and eliminated duplicate transformations via shared Git repos.

• Embedded data‑quality gates (schema drift, null thresholds) using Great Expectations + pytest inside Airflow DAGs, blocking bad loads before training.

• Produced Tableau/Matplotlib packets—calibration plots, SHAP ranks, cohort funnels—accelerating stakeholder decisions on campaign tweaks from days to hours.

Gayaz Ahmed Shaik

Hello, I am

Gayaz Ahmed Shaik.

1

1

1

About Me

Key Skills

My Projects

Sales Pulse: Excel-Powered Customer & Market Insights

Enterprise-Wide Business Intelligence Dashboard (Business Insights 360)

AdventureWorks 360° Insights Dashboard (Tableau)

CardioRisk Analyzer – Heart-Disease Prediction & Clinical Insights (Python/ML)

My Experience

Awards & Certificate

Let's Connect