Round two of the screen plays out inside this section, the final gate before any interview is on
the table. A recruiter genuinely slows the pace here, and even at that, your current role still
drives roughly 95% of the result.
That tracks: nothing demonstrates what you can run in production today better than the role you
are sitting in right now. To earn a "yes", the section must hit every part of the
Data Engineer role profile, one bullet per area listed under Domain Expertise. And
each bullet has to land on something you genuinely held in production, not on a ticket that
crossed your queue.
1
Pipeline Development & ETL/ELT
This is the daily reality of the role, and the opening box a recruiter ticks. Spell out the pipeline
you delivered, the volume it carries every day, and what shifted downstream because of it. Name
the pipeline and its consumer, not "wrote ETL".
Techniques
Batch ETL / ELT
Incremental loads
Idempotent design
Schema evolution
Tools
Spark, dbt
pandas
Apache Beam
Metrics
Pipelines in production
Volume per day
Runtime cut
2
Data Modeling & Warehousing
Where raw inputs become tables an analyst can actually trust. Lay out the model you built, the
warehouse it sits inside, and the query patterns it serves. A well-shaped model that analysts pull
from without asking questions reads as senior; "designed schemas" on its own does not.
Techniques
Dimensional / Kimball
Star & snowflake schema
SCDs
Iceberg / Delta tables
Tools
Snowflake, BigQuery
Databricks
Redshift
Metrics
Tables modeled
Query latency cut
Storage saved
3
Orchestration & Scheduling
The control plane that keeps the platform alive. Show the DAGs you built, the retry and backfill
logic you wired in, and the on-call burden it took off the team. Name the orchestrator and the
cadence you held, not "managed Airflow".
Techniques
DAG design
Retries & backfills
Sensors & triggers
Cross-DAG dependencies
Tools
Airflow
Dagster
Prefect
Metrics
DAGs in production
Failure rate down
Backfill time cut
4
Streaming & Real-Time Data
The other half of the data world, and a recruiter signal that comes up more in 2026. Show the stream
you set up, the lag you held it under, and the downstream system it fed. Name the topic and the
consumer, not "used Kafka".
Techniques
CDC
Event ingestion
Windowed aggregations
Exactly-once semantics
Tools
Kafka, Kinesis
Flink, Spark Streaming
Debezium
Metrics
Events per second
End-to-end lag
Throughput scaled
5
Performance & Cost Optimization
Slow or expensive pipelines come back to your team fast. Show the bottleneck you tracked down, the
partition or cluster change you made, and what the bill or runtime looked like after. Numbers do the
heavy lifting here: query latency, cost, throughput.
Techniques
Partition & cluster tuning
Cost attribution
Query plan analysis
Caching & materialization
Tools
Spark UI
Snowflake Query Profile
dbt threads
Metrics
Cost cut ($)
Runtime cut (h)
$/TB processed
6
Data Quality & Observability
The reason data engineering exists as a discipline: data nobody can trust is worse than no data at
all. Show the test suite you put in place, the upstream issue you caught early, and the alert that
stopped a bad ingest. Name the rule and the catch, not "set up monitoring".
Techniques
Schema tests
Row-count assertions
Freshness SLAs
Anomaly detection
Tools
Great Expectations
dbt tests
Monte Carlo, Soda
Metrics
SLA hits %
Incidents caught upstream
MTTR
7
Cloud Infrastructure & DevOps
The piece that separates a hobby project from a platform. Show the cloud you ran on, the IaC you
wrote, and the CI/CD that ships pipeline code without a manual step. Name the platform and the
deploy story, not "used AWS".
Techniques
Infrastructure as code
CI/CD for data
Containerization
Secrets management
Tools
Terraform
AWS, GCP, Azure
Docker, GitHub Actions
Metrics
Deploy frequency
Environments managed
Manual steps removed
8
Cross-Functional Collaboration
Data engineers ship nothing on their own. Describe how you worked with Analytics, Data Science, and
Platform on schema contracts, on incident response, and on roadmap calls. Call out the cross-team
work itself, and what it enabled downstream.
Techniques
Schema contracts
Incident response
Stakeholder reviews
Roadmap planning
Tools
Jira, Linear
Slack
Notion docs
Metrics
Teams served
Contracts signed
On-call load shifted