Lior Mizrahi MLOps Engineer
Mountain View, CA • lior.mizrahi@gmail.com • +1 650-555-0117
Profile Summary
- MLOps Engineer with 7 years of experience operating ML platforms and inference systems across data and AI tooling, rideshare, and financial-services ML, specializing in Kubeflow pipeline orchestration, KServe online serving, and feature-store operations.
- Solid technical background across ML platforms (Databricks), pipeline orchestration (Kubeflow Pipelines, Airflow), model serving (KServe, Triton Inference Server), experiment tracking (MLflow), feature stores (Feast), model monitoring (Arize), and languages (Python, Go) with strong fundamentals in reproducible ML workflows, lineage-aware deployments, and GPU-cost discipline.
- Deep expertise in reproducible ML pipelines, low-latency online serving, model and data lineage governance, and GPU and inference cost optimization, leveraging methodologies such as CI/CD for machine learning and shadow and canary model rollouts to drive safe, observable, and cost-efficient production ML.
- Engaged collaborator working cross-functionally with ML Research, Data Engineering, and Security teams in ML-platform-as-product environments, contributing to platform RFCs, model-review boards, and post-incident retrospectives with a user-first, ownership-first mindset.
- Emerging leader who shares technical excellence and fosters a culture of reliability obsession and lineage discipline through RFC reviews and model-platform office hours, while leading MLOps guild sessions and authoring widely adopted training-pipeline and serving-runtime templates.
Technical Skills
- ML Platform & Compute:
- Databricks, SageMaker, Vertex AI, Azure ML, Kubernetes (EKS, GKE), GPU/TPU orchestration, Ray, Horovod, DeepSpeed
- Pipelines & Orchestration:
- Kubeflow Pipelines, Airflow, Metaflow, Argo Workflows, SageMaker Pipelines, Vertex AI Pipelines
- Model Serving:
- KServe, Seldon, Triton Inference Server, BentoML, Ray Serve, TorchServe, gRPC + REST endpoints
- Tracking, Registry & CI/CD:
- MLflow (tracking + registry), Weights and Biases, Neptune, Comet, GitHub Actions for ML, ArgoCD, model-promotion gates
- Feature Stores & Data for ML:
- Feast, Tecton, Vertex Feature Store, Databricks Feature Store, Delta Lake, online + offline parity
- Monitoring & Observability:
- Arize, WhyLabs, Evidently, Fiddler, drift / latency / KPI dashboards, OpenTelemetry, Prometheus + Grafana
- Reproducibility, Versioning & Governance:
- DVC, LakeFS, Delta Lake, Docker, lineage metadata, model approval workflows, EU AI Act / GDPR / HIPAA awareness
- Languages & SDKs:
- Python, Go, Bash, SQL, Kubernetes Operator SDK, Terraform, basic Scala for Spark
Education
Work Experience
- Owned the internal ML platform powering the Lakehouse AI org and supporting 380+ ML engineers and scientists, leading end-to-end design across training infrastructure, pipeline orchestration, and inference platforms for 240+ production models running on Databricks.
- Built end-to-end ML pipelines on Kubeflow Pipelines and Airflow covering data ingestion, distributed training, and registry promotion, sustaining 1,400+ pipeline runs per week and cutting time-to-deploy from 9 days to 6 hours.
- Designed online model serving on KServe and Triton with multi-tenant deployments, GPU-batched ensembles, and autoscaling, hosting 160+ online models and cutting p99 inference latency from 430 ms to 120 ms.
- Implemented CI/CD pipelines for ML through offline validation suites, shadow and canary rollouts, and automated rollback on drift or KPI miss, lifting automated model-promotion rate to 92% of eligible candidates and cutting time-from-PR-to-canary to under 40 minutes.
- Stood up the centralized feature store on Feast with online + offline parity, point-in-time correct training datasets, and feature-level lineage, curating 380 shared features with cross-team reuse of 74% on new model launches.
- Built the model monitoring service on Arize covering drift alerting, feature-level distribution checks, and business-KPI dashboards per model, cutting median drift-to-detection from 11 days to 14 hours.
- Drove GPU and inference cost optimization via Ray-backed elastic training pools, request batching with INT8 quantization, and spot-instance autoscaling policies, lifting GPU utilization from 32% to 71% and cutting inference spend by 48% across the GPU fleet.
- Operationalized centralized experiment tracking and model registry on MLflow with auto-logged runs, model versioning with stage transitions, and lineage from training data to deployed artifact, covering 320 models across 14 ML teams.
- Owned the reproducibility and data-versioning program on DVC with DVC-tracked datasets, Dockerized training environments, and config-as-code via Hydra, achieving 100% of production retrain runs reproducible from a single commit.
- Embedded model-governance workflows including approval-gated promotions, bias and fairness checks per release, and audit logs for data and model access, clearing 2 SOC 2 audits and a GDPR review with zero high-severity findings.
- Worked closely with ML Research, Data Engineering, and Security partners to coordinate quarterly RFCs, feature-pipeline reviews, and on-call playbook design, authoring 12 ML-incident runbooks and mentoring 4 junior MLOps engineers through their first on-call rotations.