Aria Chen Data Scientist
New York, NY • datascience@gmail.com • +1 4444-9999
Profile Summary
- Data Scientist with 6 years of experience building production ML and statistical systems across e-commerce search, two-sided marketplaces, and travel personalization, specializing in causal inference, production-grade modeling, and trustworthy experimentation.
- Solid technical background across modeling (PyTorch, XGBoost, scikit-learn), experimentation (A/B Testing, CUPED, Bayesian methods), data tooling (SQL, Pandas, DuckDB), production stacks (Feast, MLflow, Airflow), and cloud (AWS, GCP) with strong scripting fundamentals in Python and SQL.
- Deep expertise in causal inference, uplift modeling, calibration analysis, and fairness auditing, leveraging methodologies such as synthetic control, propensity-score matching, and stratified A/B design to deliver defensible, reproducible, and decision-grade insights.
- Engaged collaborator working cross-functionally with Product, ML Engineering, and Business teams in Agile environments, contributing to roadmap planning, experiment review, and modeling-decision retrospectives with a pragmatic, outcome-oriented mindset.
- Emerging leader who shares technical excellence and fosters a culture of statistical rigor and reproducibility through code reviews and notebooks, while leading applied-science guild sessions and authoring widely adopted experimentation templates.
Technical Skills
- Languages & Scripting:
- Python, SQL, R, Bash
- Modeling Frameworks:
- PyTorch, TensorFlow, XGBoost, LightGBM, scikit-learn
- Experimentation:
- A/B Testing, CUPED, Bayesian methods, Sequential testing
- Data Tooling:
- Pandas, DuckDB, NumPy, dbt, Spark
- MLOps & Production:
- Feast, MLflow, Airflow, Kubeflow, BentoML
- Visualization & Storytelling:
- Streamlit, Plotly, Tableau, Looker, Jupyter
- Cloud Platforms:
- AWS (S3, SageMaker, EMR), GCP (BigQuery, Vertex AI)
- Statistical & Causal Methods:
- Hypothesis testing, regression, propensity scoring, causal inference
Education
Work Experience
- Owned the recommendation modeling stack supporting millions of nightly bookings and 300+ concurrent experiments, leading end-to-end ownership across model design, causal validation, and production reliability within a modern ML platform.
- Designed and shipped a two-tower retrieval model in PyTorch that improved booking conversion by 12% over the previous gradient-boosted ranking baseline, training on 2B+ session events with distributed PyTorch DDP on GPU clusters via Ray and Iceberg-backed feature snapshots.
- Designed and analyzed 40+ A/B tests across search ranking, pricing, and onboarding flows using stratified sampling, CUPED variance reduction, and Bayesian sequential testing, catching 3 confounded results before launch and tightening the detection horizon by 35%.
- Built a causal-inference framework using propensity-score matching, difference-in-differences, and synthetic-control modeling on observational booking data, reattributing $8M of incremental revenue and informing the marketing-budget reallocation that lifted CAC efficiency by 18%.
- Partnered with ML Engineering to ship the recommendation model to real-time serving, owning feature-store integration via Feast, shadow-traffic validation, Prometheus drift dashboards, and weekly retraining DAGs in Airflow, sustaining 99.95% inference availability and surfacing 11 silent distribution shifts over six quarters.
- Led the exploratory analysis for pricing fairness across host segments using Pandas, DuckDB, and Plotly, surfacing 3 systematic biases against new hosts that reframed the modeling roadmap and unlocked a $5M revenue opportunity the team had previously missed.
- Authored internal research notes, executive readouts, and interactive Streamlit dashboards presenting findings to Product, Marketing, and Exec leadership, with 4 papers picked up internally as the canonical reference for the search-ranking surface.
- Built and maintained 200+ features in the shared feature store feeding hotel ranking, user-personalization, and fraud-detection models, handling skewed distributions, missing-data imputation via target encoding, and time-leak prevention, cutting feature-engineering time per project by 55%.
- Established the model-validation rubric for ranking and classification models including stratified cross-validation, bias and fairness audits across demographic slices, calibration analysis via reliability curves, and adversarial robustness checks, catching overfitting in 2 production-ready candidates before launch.
- Designed a logistic regression + GBM ensemble for destination-search relevance, applying SHAP interpretability, time-based cross-validation, and Optuna hyperparameter optimization, lifting search precision@5 by 9% over the prior heuristic baseline.
- Worked closely with Product, Marketing, and Legal teams across 4 markets and 3 regulatory regimes to translate model outputs into GDPR-compliant decisions, building shared SQL playbooks, Looker dashboards, and stakeholder onboarding docs adopted by 12+ analysts.