About
Sameer Chaturvedi
Full-stack software engineer. Basketball lifer. I ship ML, mobile, and web systems end-to-end -- turning game film into numbers coaches, scouts, and front offices can actually use.

Purpose
Using engineering to solve problems in the two things I love most: basketball and physics.
Basketball
Give back to the game that has given so much.
Basketball has been the main constant in my life -- the driving motivator behind my mentality, perspectives, and success. I build AI for basketball -- from the models themselves to the iOS and web apps that put them in a coach's hands -- to bridge the eye-test and counting stats for coaching, scouting, and training. I play every day. The love of the art drives the work.
Physics
Engineering with purpose -- build the tools that extend what we can perceive and what we can power.
I studied astrophysics as an undergrad, and today I write software for plasma-fusion research at MIT: two ends of the same obsession with how the universe actually works and how we harness it. Observational instruments stretched our senses for centuries. ML and software are the next stretch -- letting us see further, model deeper, and extract energy from the phenomena that built us.
Experience
Where I've shipped.
Recommendation systems at Amazon. Scientific software at MIT. Live-basketball analytics at Cerebro Sports. Petabyte-scale security ML at Microsoft. On-device computer vision at AI.Reverie. Upstream Kubernetes runtimes at Red Hat. The range is the point.
- Machine Learning Engineer II at Amazon2025 -- current
Prime Video -- Browse and Discovery
- Mixture-of-Retrieval ML models that proportionally promote content by genre and propensity.
- Custom logistic regression with stratified sampling + softmax tuning fixed flat predictions caused by 90%+ channel over-representation in training data; improved Share-of-Voice across genre carousels.
- Designed the end-to-end pipeline for FreshPicks, a carousel for new third-party content, driving +134K annualised subscriptions.
- Architected a cross-service content-filtering pipeline for subscription-bundle carousels with realtime entitlement parsing, token decoding, and multi-filter orchestration.
- Lead Software Engineer at MIT2025 -- current
Plasma Science and Fusion Center
- Architecting DisruptionPy: a framework for analysing plasma-fusion data across five Tokamak reactors to predict and avoid disruptions using anomaly models.
- Revamping the backend to decouple data access from specific implementations; pluggable backends (MDSplus, Xarray/Zarr) through abstract interfaces for testability and multi-machine scalability.
- Lead Machine Learning Engineer at Cerebro Sports2025
Live Gameplay Team
- Full-stack build: frontend, API, LLM layer, and data pipeline all shipped by a small team.
- App lets coaches, scouts, and analysts interrogate live basketball-tournament stats via LLM chat, realtime graphs, and dashboards.
- Designed the pipeline that converts semantic text to SQL, queries the stats DB, processes data, and handles edge cases.
- Software Engineer II at Microsoft2021 -- 2025
M365 Security Detections
- Petabyte-scale big-data platform using ML to detect cyber-attacks in near real time across all of M365.
- Lead engineer for Microsoft's first mail-anomaly detection (response to the nation-state incident where hackers accessed executive and federal emails).
- Lead engineer on a GPT-based security-detection assistant that analyses machine and identity activity for attack patterns.
- Designed the production workspace that enables data-science and LLM-backed detections in an agile PySpark environment with CI/CD and realtime job monitoring.
- Expedited detection programming by over 75% on distributed systems; auto-failover geo-replicated BCDR architecture.
- Machine Learning Engineer at AI.Reverie2020
Computer Vision Team
- Lead engineer on a deep-learning + iOS app for real-time object detection and instance segmentation.
- Improved accuracy with dropout, batch normalisation, and training on custom synthetic data.
- Converted models to run on-device via LibTorch; wrote the Objective-C++ bridge, hand-rolled NMS, and the full AVFoundation capture pipeline. CoreMotion integration plotted phone-path trajectory.
- Productionised the library and ETL pipeline with custom PyTorch DataLoaders and dataset mappers.
- Software Engineer at Red Hat2019
OpenShift -- Container Runtimes
- Upstream contributions to CRI-O (Kubernetes container runtime interface) and Libpod (container and pod manager).
- Unblocked engineers on image-volume write issues inside containers, fixed container-hook monitoring and execution, and added CLI env-var flag passthrough.
Education
Boston University
- Master's
- AI + Deep Learning
- Bachelor's
- Computer Science + Astrophysics
- Honours
- Presidential Scholar -- GPA 3.75 / 4
Skills
What I've shipped with.
Everything below is from production work or research, tagged to where it was used.
Recommendation + ranking
- Mixture-of-Retrieval ML models (Amazon)
- Logistic regression, stratified sampling, softmax tuning (Amazon)
- Cross-service content-filtering pipelines (Amazon)
Security + anomaly ML
- Mail-anomaly detection for M365 (Microsoft)
- GPT-based security-detection assistant (Microsoft)
- Random-forest entity detections (Microsoft)
- Petabyte-scale event processing (Microsoft)
Distributed systems
- Spark, PySpark, Azure Functions (Microsoft)
- CI/CD, real-time job monitoring (Microsoft)
- Auto-failover geo-replicated BCDR (Microsoft)
- CRI-O, Libpod, Kubernetes container runtimes (Red Hat)
Scientific software
- DisruptionPy: anomaly models on Tokamak fusion data (MIT)
- Pluggable backends via abstract interfaces (MIT)
- MDSplus, Xarray/Zarr for multi-machine scalability (MIT)
Live sports + LLM apps
- Semantic-text to SQL pipelines (Cerebro Sports)
- Basketball stats DB + edge-case handling (Cerebro Sports)
- LLM chat + realtime graphs + dashboards (Cerebro Sports)
Classical + deep computer vision
- Kalman filters, Savitzky-Golay smoothing, Circle Hough Transform
- Motion energy, contour detection, frame differencing, template matching
- Single-shot object detection models
- Pose estimation for shot-mechanics analysis
Deep learning
- Dropout, batch normalisation, hyperparameter tuning (AI.Reverie)
- Training on custom synthetic data (AI.Reverie)
- Custom PyTorch DataLoaders and dataset mappers (AI.Reverie)
- TorchScript model conversion for mobile (AI.Reverie)
On-device mobile ML
- iOS object detection + instance segmentation (AI.Reverie)
- Swift + AVFoundation capture pipeline
- LibTorch via Objective-C++ bridge
- Device motion logging for trajectory analysis
Full-stack product
- End-to-end app: UI, API, data pipeline, LLM (Cerebro Sports)
- Realtime graphs + dashboards over a stats DB (Cerebro Sports)
- Agile PySpark workspace with CI/CD + job monitoring (Microsoft)
- Next.js, React, TypeScript, Tailwind (this site)
Small-team engineering
- Leading features from spec to ship -- at Amazon, MIT, Microsoft
- Working with scientists, analysts, and non-technical stakeholders
- Production release hygiene: CI/CD, BCDR, observability
- Framework choice as a design decision, not a reflex