Avery Brooks Infrastructure Engineer
New York, NY • infra@gmail.com • +1 6465-3333
Profile Summary
- Infrastructure Engineer with 7 years of experience designing and operating internal developer platforms across consumer-internet platforms, content delivery, and developer tooling, specializing in Kubernetes platforms, infrastructure as code, and reliability engineering.
- Solid technical background across cloud (AWS, GCP), IaC tools (Terraform, Pulumi), container orchestration (Kubernetes, Helm + Istio), CI/CD (GitHub Actions, ArgoCD), and languages (Go, Python) with strong fundamentals in Linux internals, networking, and Bash automation.
- Deep expertise in internal developer platforms, Kubernetes-native architecture, GitOps workflows, and zero-downtime deployments, leveraging methodologies such as golden paths and paved roads to drive scalable, secure, and cost-efficient infrastructure platforms.
- Engaged collaborator working cross-functionally with Application, Security, and Finance teams in Agile environments, contributing to architecture reviews, platform-roadmap discussions, and post-incident retrospectives with a pragmatic, ownership-first mindset.
- Emerging leader who shares technical excellence and fosters a culture of platform thinking and developer-experience focus through PR reviews and runbooks, while leading infrastructure guild sessions and authoring widely adopted Terraform module templates.
Technical Skills
- Cloud Platforms:
- AWS (EKS, EC2, RDS, S3, IAM), GCP (GKE, Cloud SQL), Azure (AKS)
- Infrastructure as Code:
- Terraform, Pulumi, CloudFormation, Ansible, Helm
- Containers & Orchestration:
- Kubernetes, Docker, Helm, Istio, Argo Rollouts
- CI/CD & Delivery:
- GitHub Actions, ArgoCD, GitLab CI, Jenkins, Flux
- Networking & Connectivity:
- VPC, Transit Gateway, Route 53, CloudFront, VPN, BGP
- Security & IAM:
- IAM, Vault, KMS, Trivy, SOC 2, ISO 27001
- Observability & Monitoring:
- Prometheus, Grafana, OpenTelemetry, Datadog, ELK, Loki
- Languages & Scripting:
- Go, Python, Bash, SQL, YAML
Education
Work Experience
- Owned the end-to-end infrastructure platform for content-serving services at 15M+ DAU, leading design across cloud architecture, container orchestration, and CI/CD platform for 80+ microservices in a polyglot AWS + GCP environment.
- Built a Terraform + Pulumi module library for cluster, network, and database provisioning, shipping reusable network/cluster modules, policy-as-code via Sentinel, and drift-detection automation, cutting per-team onboarding lead time from 2 weeks to 1 day across 40+ engineering teams.
- Built the CI/CD platform on GitHub Actions and ArgoCD with GitOps deployment patterns, progressive delivery via Flagger, and automated rollback on SLO breach, operating 220+ pipelines and cutting commit-to-prod lead time from 4 hours to 18 minutes.
- Operated Kubernetes clusters across 3 regions with Istio service mesh, Helm chart standardization, and horizontal pod autoscaling, supporting 80+ services with 99.95% control-plane availability.
- Designed a multi-cloud VPC architecture using Transit Gateway peering, HashiCorp Consul service discovery, and WAF and ingress hardening, connecting 40+ accounts at 62ms p95 cross-cloud latency.
- Implemented a multi-cloud IAM strategy including least-privilege role hierarchies, HashiCorp Vault for secrets, container image scanning via Trivy, and encryption-in-transit policies, meeting SOC 2 Type II and ISO 27001 controls and reducing audit findings from 28 to 0 across two consecutive cycles.
- Established the SLI/SLO program for 20 tier-1 services, defined EKS autoscaling policies, load testing with k6, and capacity-headroom reviews that absorbed a 3.5x holiday traffic surge with zero degradation.
- Built a unified observability stack on Prometheus, Grafana, and OpenTelemetry, defining SLO dashboards, distributed tracing, and structured logging via Loki, covering 45 services and reducing mean time to detect from 22 minutes to 2 minutes.
- Served as on-call rotation lead for the platform team, coordinating 35 SEV1/SEV2 incidents and facilitating the team's blameless postmortem program with action-item tracking and a weekly incident review board, lifting close rate from 48% to 91% within three quarters.
- Drove the FinOps program through org-wide tagging policy, a Reserved Instance and Savings Plan portfolio, and rightsizing automation via Cloud Custodian, cutting annual cloud spend by 24% (~$5.2M) without service-level impact.
- Worked closely with Application, Security, and Finance teams across 6 product domains to negotiate platform standards, compliance controls, and cost-allocation policies, authoring 9 platform RFCs that shaped the org's golden-path rollout and onboarding 13 new infrastructure engineers.