Nikhil Rao Machine Learning Engineer
Mountain View, CA • ml@gmail.com • +1 6505-2222
Profile Summary
- Machine Learning Engineer with 6 years of experience designing and operating production ML systems across LLM safety, content recommendations, and ranking systems, specializing in model training, low-latency serving, and MLOps.
- Solid technical background across frameworks (PyTorch, TensorFlow, Hugging Face), languages (Python, SQL), serving infrastructure (Triton, KServe), MLOps (MLflow, Weights & Biases, Feast), and cloud (AWS, GCP) with strong fundamentals in distributed training and GPU optimization.
- Deep expertise in end-to-end ML system design, LLM fine-tuning, real-time model serving, and responsible AI evaluation, leveraging methodologies such as continuous training pipelines and shadow deployments to drive reliable, observable, and cost-aware ML platforms.
- Engaged collaborator working cross-functionally with Research, Product, and Eng teams in Agile environments, contributing to model-launch reviews, evaluation design, and post-launch retrospectives with a pragmatic, ownership-first mindset.
- Emerging leader who shares technical excellence and fosters a culture of rigor in evaluation and reproducibility discipline through PR reviews and runbooks, while leading ML guild sessions and authoring widely adopted training-pipeline templates.
Technical Skills
- ML Frameworks & Libraries:
- PyTorch, TensorFlow, JAX, scikit-learn, Hugging Face, vLLM
- Languages & Scripting:
- Python, SQL, Go, C++, Bash
- Data & ETL for ML:
- Spark, Ray, Apache Beam, Pandas, dbt, Airflow
- Feature Stores:
- Feast, Tecton, Vertex AI Feature Store
- Model Serving:
- TorchServe, Triton, KServe, SageMaker, Vertex AI
- MLOps & Experiment Tracking:
- MLflow, Weights & Biases, Kubeflow, Metaflow, DVC
- Cloud & Compute:
- AWS (SageMaker, S3, EKS, Lambda), GCP (Vertex AI, GKE), GPU/TPU clusters
- Evaluation & Responsible AI:
- Offline/online evals, A/B testing, fairness audits, robustness checks
Education
Work Experience
- Owned end-to-end ML system architecture for the Claude evaluation platform processing 20M+ evaluation runs/month, leading design across training pipelines, serving infrastructure, and feedback loops spanning 8 model families in a polyglot Python/Go/Rust environment.
- Trained and fine-tuned a safety classifier for constitutional AI rejections using PyTorch and Hugging Face, applying LoRA fine-tuning, DPO post-training, and gradient checkpointing, lifting refusal precision from 84% to 96% on the internal redteam benchmark.
- Deployed models as real-time inference APIs on Triton and KServe with dynamic batching, model parallelism, and token-level streaming, serving 35k QPS at 320ms p95 latency and 99.95% uptime across multiple regions.
- Built the team's model training and release pipeline in MLflow with dataset versioning via DVC, experiment tracking, and automated eval gates, cutting model lead time from commit to prod from 4 weeks to 3 days.
- Stood up production model monitoring for 8 models in serving, tracking input drift via population-stability index, output distribution shifts, and business KPI tracking, surfacing 14 silent regressions in the first six months and triggering 4 emergency retrains.
- Optimized inference cost through INT8 quantization, knowledge distillation, and GPU utilization batching, lifting throughput by 3.2x (from 11k QPS to 35k QPS) and cutting per-token serving cost by 62% during a major scale-up.
- Designed the team's offline + online evaluation framework including A/B-tested capability evals, shadow-deployed safety probes, and bias-and-fairness audits, running 60+ structured evals that gated 9 model launches without a customer-visible regression.
- Built 180+ production features for the Reels ranking model, owned through a Feast feature store with point-in-time correctness, freshness monitoring, and shared training/serving paths, powering 6 ranking models and lifting top-line engagement by 8%.
- Owned training data pipelines in Spark on EMR and Apache Beam, processing 50TB/day of interaction logs with schema enforcement, dedup and quality checks, and lineage tracking, hitting a 2-hour freshness SLA across batch and streaming inference paths.
- Implemented a two-tower retrieval model for content recommendations in TensorFlow, training on 2B+ user interactions across 4xA100 GPUs, lifting NDCG@10 by 14.5% vs the previous Wide&Deep baseline.
- Worked closely with Product, Eng, and Trust & Safety teams across 3 product surfaces to negotiate evaluation criteria, metric definitions, and launch gates, authoring 7 ML RFCs that shaped the org's responsible-AI guardrails and onboarding 10 new MLEs.