Aria Chen Data Scientist
New York, NY • datascience@gmail.com • +1 4444-9999
Profile Summary
- Data Scientist with 6 years of experience building production ML and statistical systems across e-commerce search, two-sided marketplaces, and travel personalization, specializing in causal inference, production-grade modeling, and trustworthy experimentation.
- Solid technical background across modeling (PyTorch, XGBoost, scikit-learn), experimentation (A/B Testing, CUPED, Bayesian methods), data tooling (SQL, Pandas, DuckDB), production stacks (Feast, MLflow, Airflow), and cloud (AWS, GCP) with strong scripting fundamentals in Python and SQL.
- Deep expertise in causal inference, uplift modeling, calibration analysis, and fairness auditing, leveraging methodologies such as synthetic control, propensity-score matching, and stratified A/B design to deliver defensible, reproducible, and decision-grade insights.
- Engaged collaborator working cross-functionally with Product, ML Engineering, and Business teams in Agile environments, contributing to roadmap planning, experiment review, and modeling-decision retrospectives with a pragmatic, outcome-oriented mindset.
- Emerging leader who shares technical excellence and fosters a culture of statistical rigor and reproducibility through code reviews and notebooks, while leading applied-science guild sessions and authoring widely adopted experimentation templates.
Technical Skills
- Languages & Scripting:
- Python, SQL, R, Bash
- Modeling Frameworks:
- PyTorch, TensorFlow, XGBoost, LightGBM, scikit-learn
- Experimentation:
- A/B Testing, CUPED, Bayesian methods, Sequential testing
- Data Tooling:
- Pandas, DuckDB, NumPy, dbt, Spark
- MLOps & Production:
- Feast, MLflow, Airflow, Kubeflow, BentoML
- Visualization & Storytelling:
- Streamlit, Plotly, Tableau, Looker, Jupyter
- Cloud Platforms:
- AWS (S3, SageMaker, EMR), GCP (BigQuery, Vertex AI)
- Statistical & Causal Methods:
- Hypothesis testing, regression, propensity scoring, causal inference
Education
Work Experience
- Own the recommendation modeling stack across 2 product squads, supporting millions of nightly bookings and 300+ concurrent experiments; lead end-to-end ownership across model design, causal validation, and production reliability within a modern ML platform.
- Designed and shipped a two-tower retrieval model in PyTorch that lifted booking conversion by 12% over the prior gradient-boosted ranking baseline; trained on 2B+ session events on distributed GPU clusters orchestrated by Ray, with feature snapshots backed by Iceberg.
- Designed and analyzed 40+ A/B tests across search ranking, pricing, and onboarding flows using stratified sampling, CUPED variance reduction, and Bayesian sequential testing; caught 3 confounded results before launch and tightened the detection horizon by 35%.
- Built a causal-inference framework on observational booking data using propensity-score matching, difference-in-differences, and synthetic control; reattributed $8M of incremental revenue and informed a marketing-budget reallocation that lifted CAC efficiency by 18%.
- Partnered with ML Engineering to ship the recommendation model to real-time serving with Feast feature integration, shadow traffic validation, Prometheus drift dashboards, and weekly retraining DAGs on Airflow; sustained 99.95% inference availability and surfaced 11 silent distribution shifts over six quarters.
- Led the exploratory analysis on pricing fairness across host segments using Pandas, DuckDB, and Plotly; surfaced 3 systematic biases against new hosts, reframing the modeling roadmap and unlocking a $5M revenue opportunity the team had previously missed.
- Authored research notes, exec readouts, and Streamlit dashboards that translated model behavior for Product, Marketing, and Exec leadership, with 4 papers picked up as the internal reference for the search-ranking surface.
- Built and maintained 200+ production features in the shared feature store feeding hotel ranking, personalization, and fraud detection models, handling skewed distributions, target encoding for missing data, and time-leakage controls in CV folds; cut feature-engineering time per project by 55%.
- Established the model-validation rubric for ranking and classification, layering stratified CV, fairness audits across demographic slices, and reliability curves for calibration; caught overfitting in 2 production-ready candidates before launch.
- Designed a GBM ensemble for destination-search relevance, applying SHAP interpretability, time-based cross-validation, and Optuna hyperparameter tuning; lifted precision@5 by 9% over the prior heuristic baseline.
- Worked closely with Product, Marketing, and Legal across four markets and three regulatory regimes to translate model outputs into GDPR-compliant decisions, shipping shared SQL playbooks and Looker dashboards adopted by 12+ analysts.