About

Sameer Chaturvedi

Full-stack software engineer. Basketball lifer. I ship ML, mobile, and web systems end-to-end -- turning game film into numbers coaches, scouts, and front offices can actually use.

sam.chaturvedi24@gmail.com Resume (PDF)LinkedIn GitHub / samc24

Purpose

Using engineering to solve problems in the two things I love most: basketball and physics.

Basketball

Give back to the game that has given so much.

Basketball has been the main constant in my life -- the driving motivator behind my mentality, perspectives, and success. I build AI for basketball -- from the models themselves to the iOS and web apps that put them in a coach's hands -- to bridge the eye-test and counting stats for coaching, scouting, and training. I play every day. The love of the art drives the work.

Physics

Engineering with purpose -- build the tools that extend what we can perceive and what we can power.

I studied astrophysics as an undergrad, and today I write software for plasma-fusion research at MIT: two ends of the same obsession with how the universe actually works and how we harness it. Observational instruments stretched our senses for centuries. ML and software are the next stretch -- letting us see further, model deeper, and extract energy from the phenomena that built us.

Experience

Where I've shipped.

Recommendation systems at Amazon. Scientific software at MIT. Live-basketball analytics at Cerebro Sports. Petabyte-scale security ML at Microsoft. On-device computer vision at AI.Reverie. Upstream Kubernetes runtimes at Red Hat. The range is the point.

Machine Learning Engineer II at Amazon
2025 -- current
Prime Video -- Browse and Discovery
- Mixture-of-Retrieval ML models that proportionally promote content by genre and propensity.
- Custom logistic regression with stratified sampling + softmax tuning fixed flat predictions caused by 90%+ channel over-representation in training data; improved Share-of-Voice across genre carousels.
- Designed the end-to-end pipeline for FreshPicks, a carousel for new third-party content, driving +134K annualised subscriptions.
- Architected a cross-service content-filtering pipeline for subscription-bundle carousels with realtime entitlement parsing, token decoding, and multi-filter orchestration.
Lead Software Engineer at MIT
2025 -- current
Plasma Science and Fusion Center
- Architecting DisruptionPy: a framework for analysing plasma-fusion data across five Tokamak reactors to predict and avoid disruptions using anomaly models.
- Revamping the backend to decouple data access from specific implementations; pluggable backends (MDSplus, Xarray/Zarr) through abstract interfaces for testability and multi-machine scalability.
Lead Machine Learning Engineer at Cerebro Sports
2025
Live Gameplay Team
- Full-stack build: frontend, API, LLM layer, and data pipeline all shipped by a small team.
- App lets coaches, scouts, and analysts interrogate live basketball-tournament stats via LLM chat, realtime graphs, and dashboards.
- Designed the pipeline that converts semantic text to SQL, queries the stats DB, processes data, and handles edge cases.
Software Engineer II at Microsoft
2021 -- 2025
M365 Security Detections
- Petabyte-scale big-data platform using ML to detect cyber-attacks in near real time across all of M365.
- Lead engineer for Microsoft's first mail-anomaly detection (response to the nation-state incident where hackers accessed executive and federal emails).
- Lead engineer on a GPT-based security-detection assistant that analyses machine and identity activity for attack patterns.
- Designed the production workspace that enables data-science and LLM-backed detections in an agile PySpark environment with CI/CD and realtime job monitoring.
- Expedited detection programming by over 75% on distributed systems; auto-failover geo-replicated BCDR architecture.
Machine Learning Engineer at AI.Reverie
2020
Computer Vision Team
- Lead engineer on a deep-learning + iOS app for real-time object detection and instance segmentation.
- Improved accuracy with dropout, batch normalisation, and training on custom synthetic data.
- Converted models to run on-device via LibTorch; wrote the Objective-C++ bridge, hand-rolled NMS, and the full AVFoundation capture pipeline. CoreMotion integration plotted phone-path trajectory.
- Productionised the library and ETL pipeline with custom PyTorch DataLoaders and dataset mappers.
Software Engineer at Red Hat
2019
OpenShift -- Container Runtimes
- Upstream contributions to CRI-O (Kubernetes container runtime interface) and Libpod (container and pod manager).
- Unblocked engineers on image-volume write issues inside containers, fixed container-hook monitoring and execution, and added CLI env-var flag passthrough.

Education

Boston University

Master's: AI + Deep Learning
Bachelor's: Computer Science + Astrophysics
Honours: Presidential Scholar -- GPA 3.75 / 4

Skills

What I've shipped with.

Everything below is from production work or research, tagged to where it was used.

Recommendation + ranking

Mixture-of-Retrieval ML models (Amazon)
Logistic regression, stratified sampling, softmax tuning (Amazon)
Cross-service content-filtering pipelines (Amazon)

Security + anomaly ML

Mail-anomaly detection for M365 (Microsoft)
GPT-based security-detection assistant (Microsoft)
Random-forest entity detections (Microsoft)
Petabyte-scale event processing (Microsoft)

Distributed systems

Spark, PySpark, Azure Functions (Microsoft)
CI/CD, real-time job monitoring (Microsoft)
Auto-failover geo-replicated BCDR (Microsoft)
CRI-O, Libpod, Kubernetes container runtimes (Red Hat)

Scientific software

DisruptionPy: anomaly models on Tokamak fusion data (MIT)
Pluggable backends via abstract interfaces (MIT)
MDSplus, Xarray/Zarr for multi-machine scalability (MIT)

Live sports + LLM apps

Semantic-text to SQL pipelines (Cerebro Sports)
Basketball stats DB + edge-case handling (Cerebro Sports)
LLM chat + realtime graphs + dashboards (Cerebro Sports)

Classical + deep computer vision

Kalman filters, Savitzky-Golay smoothing, Circle Hough Transform
Motion energy, contour detection, frame differencing, template matching
Single-shot object detection models
Pose estimation for shot-mechanics analysis

Deep learning

Dropout, batch normalisation, hyperparameter tuning (AI.Reverie)
Training on custom synthetic data (AI.Reverie)
Custom PyTorch DataLoaders and dataset mappers (AI.Reverie)
TorchScript model conversion for mobile (AI.Reverie)

On-device mobile ML

iOS object detection + instance segmentation (AI.Reverie)
Swift + AVFoundation capture pipeline
LibTorch via Objective-C++ bridge
Device motion logging for trajectory analysis

Full-stack product

End-to-end app: UI, API, data pipeline, LLM (Cerebro Sports)
Realtime graphs + dashboards over a stats DB (Cerebro Sports)
Agile PySpark workspace with CI/CD + job monitoring (Microsoft)
Next.js, React, TypeScript, Tailwind (this site)

Small-team engineering

Leading features from spec to ship -- at Amazon, MIT, Microsoft
Working with scientists, analysts, and non-technical stakeholders
Production release hygiene: CI/CD, BCDR, observability
Framework choice as a design decision, not a reflex