Maya Patel Data Engineer
Austin, TX • dataeng@gmail.com • +1 5555-7777
Profile Summary
- Data Engineer with 6 years of experience designing and operating high-throughput data platforms across SaaS analytics, fintech transaction systems, and product-event pipelines, specializing in dimensional modeling, low-latency streaming, and end-to-end data quality.
- Solid technical background across batch processing (Spark, dbt), streaming (Kafka, Flink), warehouses (Snowflake, BigQuery), orchestration (Airflow, Dagster), and cloud ecosystems (AWS, GCP) with strong scripting fundamentals in Python and SQL.
- Deep expertise in dimensional modeling, lakehouse architecture, data contracts, and CDC ingestion patterns, leveraging methodologies such as Kimball star schemas and medallion architecture to drive trustworthy, queryable, and auditable datasets.
- Engaged collaborator working cross-functionally with Analytics, ML, and Product teams in Agile environments, contributing to data-modeling reviews, SLA negotiations, and stakeholder workshops with a pragmatic, outcome-oriented mindset.
- Emerging leader who shares technical excellence and fosters a culture of data quality and cost discipline through code reviews and runbooks, while leading data-platform working groups and authoring widely adopted contract templates.
Technical Skills
- Languages & Scripting:
- Python, SQL, Bash, Scala (basic)
- Warehouses & Lakehouses:
- Snowflake, BigQuery, Redshift, Databricks
- Processing & Transformation:
- Spark, dbt, Flink, Kafka, Kinesis
- Orchestration:
- Airflow, Dagster, Prefect, dbt Cloud
- Storage Formats & Lakes:
- S3, GCS, Iceberg, Delta Lake, Parquet, Avro
- Data Quality & Lineage:
- dbt tests, Great Expectations, Soda, OpenLineage
- Cloud Platforms:
- AWS (S3, Glue, EMR, Lambda, IAM), GCP (BigQuery, Dataflow, Pub/Sub)
- DevOps & Tooling:
- Terraform, GitHub Actions, Docker, Kubernetes, Datadog, Git
Education
Work Experience
- Owned the analytics data platform supporting hundreds of internal dashboards and machine-learning training pipelines, leading end-to-end design and operation across pipeline reliability, data modeling, and cost performance within a modern cloud-native data stack.
- Built and maintained a fleet of 120+ ELT pipelines using dbt and Airflow, moving transactional and event data from Kafka, Postgres, and internal APIs into Snowflake, with parameterized DAG templates that cut new-pipeline onboarding time from 3 days to 4 hours.
- Designed a star-schema data warehouse in Snowflake using dbt with SCD Type 2 dimensions, accumulating-snapshot fact tables, and contract-tested staging layers, enabling self-serve analytics for 200+ internal users while keeping query latency under 3 seconds on critical dashboards.
- Optimized Snowflake storage and compute costs through clustering keys, micro-partition pruning, and warehouse auto-suspend policies, reducing monthly compute spend by 38% while improving p95 query latency by 44% across reporting workloads.
- Migrated 80+ scheduled jobs from cron + Lambda to Airflow with the TaskFlow API, custom sensors for data-availability and SLA-miss alerting, and DAG-level retry policies, eliminating silent failures and improving on-time delivery rate from 82% to 99.4%.
- Stood up a real-time event ingestion pipeline using Kafka, Flink, and Iceberg with exactly-once semantics, watermark-based windowing, and stateful aggregations, delivering fresh-within-60 seconds metrics to product teams across 12+ event topics.
- Implemented data quality at every layer using dbt tests, Great Expectations suites, and Soda anomaly detection on 180+ critical tables, raising test coverage from 22% to 91% and catching 15-20 data-quality regressions per quarter before they reached production dashboards.
- Ingested data from 60+ source systems including APIs, OLTP databases, S3 dumps, and SaaS connectors using Fivetran, custom Python connectors, and CDC via Debezium, unifying transaction, customer, and event data into a single normalized warehouse layer with 99.7% freshness SLA.
- Managed Postgres, Redshift, and S3-based data lake tiers across development, staging, and production environments, implementing partitioning, vacuum scheduling, and storage tier transitions that reduced average query cost per TB by 42%.
- Provisioned AWS data infrastructure using Terraform modules, set up CI/CD for dbt and Airflow code via GitHub Actions, and containerized Python ETL jobs with Docker, cutting infrastructure provisioning time from days to under 30 minutes and reducing deployment failures by 55%.
- Implemented column-level access controls, PII tagging, data lineage via OpenLineage, GDPR-compliant deletion workflows, and pipeline SLA dashboards using Datadog, surfacing 99.5% uptime against published SLAs across critical financial reporting datasets.