The second round of the screen plays out inside this section, the closing gate before any interview hitting the table. A recruiter genuinely slows the pace here, and even so, your current chair drives around 95% of the result.
That tracks: nothing demonstrates what you can ship in production today better than the seat you sit in this quarter. To earn a "yes", the section has to hit every entry on the AI Engineer role profile, one bullet per area listed under Domain Expertise. And every bullet has to come off work you genuinely owned in production, never a Jira card that drifted past your queue.
1
LLM Application Development
The headline work of the role, and the first checkbox a recruiter ticks. Spell out the LLM
feature you shipped, the use case behind it, plus the product surface it lives on. Name the
feature and what it unlocked for users, not "built with LLMs".
Techniques
Chat & copilot UX
Structured outputs
Streaming responses
Tool / function calling
Tools
OpenAI, Anthropic, Bedrock
LangChain, LlamaIndex
FastAPI, Next.js
Metrics
Features in production
Active users on the feature
Adoption / retention lift
2
RAG & Retrieval Systems
Where LLMs meet your actual data. Lay out the retrieval pipeline you built, the corpus it
covers, plus the answer quality it lifted versus the baseline. A retrieval system that
grounded answers and cut hallucinations reads as senior; "wired up RAG" on its own
does not.
Techniques
Chunking & embedding
Hybrid & semantic search
Reranking
Context window management
Tools
Pinecone, pgvector, Weaviate
text-embedding-3, Cohere Rerank
LlamaIndex, Haystack
Metrics
Retrieval recall / nDCG
Answer groundedness
Hallucination rate cut
3
Agents & Tool Use
Where the LLM stops answering and starts doing. Describe the agent you built, the tools it
calls, plus the tasks it now completes end to end. Name the workflow and the success rate,
not "built an agent".
Techniques
Function calling
Multi-step planning
ReAct & tool routing
Memory & state management
Tools
LangGraph, CrewAI
OpenAI Assistants
Anthropic Tool Use
Metrics
Task success rate
Steps to completion
Manual handoffs cut
4
Prompt Engineering & Evals
The discipline that separates LLM features that hold up in production from demos that
break. Cover the prompt iterations you ran, the eval suite you built, plus the regression
you caught before users did. Cite the eval pass rate and what the framework now catches,
not "optimized prompts".
Techniques
Few-shot & chain-of-thought
LLM-as-judge
Golden datasets
Regression eval suites
Tools
LangSmith, Braintrust
Phoenix, Helicone
Promptfoo, Ragas
Metrics
Eval pass rate
Regression catches before release
Prompt iterations tracked
5
LLM Deployment & LLMOps
A prompt in a notebook is a demo; a streaming endpoint behind a token budget is a product.
Show what you moved from prototype into production: the serving setup, the cost controls,
plus the observability you wired up. Numbers do the heavy lifting here: first-token latency,
tokens per request, cost per call, error rate.
Techniques
Streaming & SSE
Caching & semantic cache
Rate limiting & fallback
Cost & token telemetry
Tools
AWS Bedrock, Azure OpenAI
LangSmith, Helicone
FastAPI, vLLM
Metrics
Time-to-first-token
Cost per request
Error budget hit rate
6
Safety, Guardrails & Compliance
The reason an LLM feature gets to ship to real users. Show the guardrails you wired in, the
jailbreak attempts you blocked, plus the PII or compliance gate the feature passes. Name
the safety control and what it blocks, not "added safety".
Techniques
Input / output moderation
Prompt injection defense
PII detection & redaction
Toxicity & bias filtering
Tools
NeMo Guardrails, Guardrails AI
OpenAI Moderation, Llama Guard
Presidio, Lakera
Metrics
Unsafe outputs blocked
Jailbreak attempts caught
PII leak rate
7
Cross-Functional Collaboration
AI Engineers ship nothing alone. Describe how you partnered with Product on the use case,
Design on the chat or copilot UX, plus Backend on the integration. Spell out the joint
outcome and the eval bar you set together, not just the teams in the room.
Techniques
Use-case scoping with Product
Prompt review with Design
API contracts with Backend
Joint launch criteria
Tools
Notion, Figma, Linear
Slack, GitHub
Loom, Miro
Metrics
Features shipped jointly
Squads supported
Time from idea to launch
8
Tooling & Workflow
The daily setup that lets you ship LLM features without yak-shaving. Cover the prompt
versioning you keep, the eval runs you trigger in CI, plus the review pattern that catches
a regression before it reaches production. Spell out what you actually use, not "a
modern AI stack".
Techniques
Prompt version control
CI evals on every PR
Trace logging
Reproducible LLM envs
Tools
Git, GitHub Actions
LangSmith, Braintrust, Helicone
Docker, Poetry, uv
Metrics
Prompts under version control
Evals running in CI
Onboarding ramp time cut