Lior Mizrahi MLOps Engineer
Mountain View, CA • lior.mizrahi@gmail.com • +1 650-555-0117
Profile Summary
- MLOps Engineer with 7 years of experience operating ML platforms and inference systems across data and AI tooling, rideshare, and financial-services ML, specializing in Kubeflow pipeline orchestration, KServe online serving, and feature-store operations.
- Solid technical background across ML platforms (Databricks), pipeline orchestration (Kubeflow Pipelines, Airflow), model serving (KServe, Triton Inference Server), experiment tracking (MLflow), feature stores (Feast), model monitoring (Arize), and languages (Python, Go) with strong fundamentals in reproducible ML workflows, lineage-aware deployments, and GPU-cost discipline.
- Deep expertise in reproducible ML pipelines, low-latency online serving, model and data lineage governance, and GPU and inference cost optimization, leveraging methodologies such as CI/CD for machine learning and shadow and canary model rollouts to drive safe, observable, and cost-efficient production ML.
- Engaged collaborator working cross-functionally with ML Research, Data Engineering, and Security teams in ML-platform-as-product environments, contributing to platform RFCs, model-review boards, and post-incident retrospectives with a user-first, ownership-first mindset.
- Emerging leader who shares technical excellence and fosters a culture of reliability obsession and lineage discipline through RFC reviews and model-platform office hours, while leading MLOps guild sessions and authoring widely adopted training-pipeline and serving-runtime templates.
Technical Skills
- ML Platform & Compute:
- Databricks, SageMaker, Vertex AI, Azure ML, Kubernetes (EKS, GKE), GPU/TPU orchestration, Ray, Horovod, DeepSpeed
- Pipelines & Orchestration:
- Kubeflow Pipelines, Airflow, Metaflow, Argo Workflows, SageMaker Pipelines, Vertex AI Pipelines
- Model Serving:
- KServe, Seldon, Triton Inference Server, BentoML, Ray Serve, TorchServe, gRPC + REST endpoints
- Tracking, Registry & CI/CD:
- MLflow (tracking + registry), Weights and Biases, Neptune, Comet, GitHub Actions for ML, ArgoCD, model-promotion gates
- Feature Stores & Data for ML:
- Feast, Tecton, Vertex Feature Store, Databricks Feature Store, Delta Lake, online + offline parity
- Monitoring & Observability:
- Arize, WhyLabs, Evidently, Fiddler, drift / latency / KPI dashboards, OpenTelemetry, Prometheus + Grafana
- Reproducibility, Versioning & Governance:
- DVC, LakeFS, Delta Lake, Docker, lineage metadata, model approval workflows, EU AI Act / GDPR / HIPAA awareness
- Languages & SDKs:
- Python, Go, Bash, SQL, Kubernetes Operator SDK, Terraform, basic Scala for Spark
Education
Work Experience
- Owned the internal ML platform powering the Lakehouse AI engineering org supporting 380+ ML engineers and scientists, leading end-to-end design across training infrastructure, pipeline orchestration, and inference platforms for 240+ production models running on Databricks.
- Built end-to-end ML pipelines on Kubeflow Pipelines and Airflow, covering data ingestion and preprocessing, distributed training and evaluation, and model packaging and registry promotion, sustaining 1,400+ pipeline runs per week and cutting time-to-deploy from 9 days to 6 hours.
- Designed online model serving on KServe and Triton Inference Server with multi-tenant KServe deployments, GPU-batched Triton ensembles, and autoscaling and request batching, hosting 160+ online models and cutting p99 inference latency from 430 ms to 120 ms.
- Implemented CI/CD pipelines for ML with offline validation suites and bias checks, shadow and canary rollouts, and automated rollback on drift or KPI miss, lifting automated model-promotion rate to 92% of eligible candidates across the org.
- Stood up the centralized feature store on Feast with online + offline parity, point-in-time correct training datasets, and feature-level lineage and access control, curating 380 curated features with cross-team reuse where 74% of new models reused at least 3 shared features.
- Built the model monitoring service on Arize covering data and concept drift alerting, feature-level distribution checks, and business-KPI dashboards per model, cutting median drift-to-detection from 11 days to 14 hours.
- Drove GPU and inference cost optimization via Ray-backed Ray-backed elastic training pools, request batching and INT8 quantization, and spot-instance and autoscaling policies, lifting GPU utilization from 32% to 71% and cutting inference spend by 48% across the GPU fleet.
- Operationalized centralized experiment tracking and model registry on MLflow, providing auto-logged runs with code, params, and metrics, model versioning with stage transitions, and lineage from training data to deployed artifact, covering 320 models across 14 ML teams.
- Owned the reproducibility and data-versioning program on DVC with DVC-tracked datasets, Dockerized training environments, and config-as-code via Hydra, achieving 100% of production retrain runs reproducible from a single commit.
- Embedded model-governance workflows including approval-gated model promotions, bias and fairness checks per release, and audit logs for data and model access, clearing 2 SOC 2 audits and a GDPR review passed with zero high-severity findings.
- Worked closely with ML Research, Data Engineering, and Security partners to coordinate quarterly model-platform RFCs, feature-pipeline reviews, and on-call playbook design, authoring 12 ML-incident runbooks that shaped the team's standard playbook and mentoring 4 junior MLOps and ML engineers through their first on-call rotations.