Product portfolio

Things I've built & shipped

A selection of products from my time on Capital One's AI/ML platform team and the data science group at Alio, from exploratory research and zero-to-one features to scaled infrastructure serving 100M+ customers.

Tap 2 Verify
Consumer·Web Identity Verification KYC

Tap 2 Verify

Led identity verification for Capital One's Online Account Opening flow, replacing the multi-day Government ID process with a real-time tap-to-verify experience. Applicants confirm their identity via a link on their phones, eliminating friction for the 85k+ applicants per year previously routed to manual review and driving incremental account bookings at scale.

3k new accounts booked / month
7k fewer manual reviews / month
$2.4M realized NPV in 2022
Platform ML Infrastructure Internal tool

Versioning

Data scientists were previously tacking version suffixes onto feature names, flooding the platform with near duplicates and making reuse risky. Introduced first standardized versioning policy for data products, with automated change classification, tiered review flows, and a full audit trail to augment data product lifecycle governance.

+12 increased NPS score
+25% feature reuse rate
0 to 1 platform versioning standard
Platform Data Governance Internal tool

Lineage

Built Feature Lineage to document the full lifecycle of data from origination through to model use, tracking both dataset-level flow and element-level field transformations across pipelines. Gave data scientists instant visibility into where features came from and how they were computed, enabling faster knowledge transfer, safer feature reuse, and audit-ready documentation for compliance requests.

Element-level transformation tracking
less than 24hrs audit response time
0 to 1 enterprise lineage standard
Platform User Personalization Event-driven

Personalization Tiles

Created event-based triggers for feature compute to power Capital One's Feed recommender model, powering personalized tile ranking for millions of customers. Enabled personalization features capturing customer engagement signals (accept, like, dislike, dismiss, postpone), and interaction counts, allowing the model to surface the right offers and actions to each customer at the right moment.

Real-time feature serving
6 signals engagement types tracked
100M+ customers served
Alio logo
Data Science Machine Learning Healthcare

Hematocrit Prediction Model

Built a machine learning model to predict blood hematocrit non-invasively from raw PPG sensor data at Alio. The company's wearable SmartPatch uses infrared light to continuously monitor dialysis patients at home. Replaced a heuristic ratio-of-ratios baseline with a random forest regressor trained on AC and DC amplitude features across IR channels — reducing prediction error by 80% and establishing a full pipeline from MongoDB sensor reads through AWS SageMaker.

80% error reduction vs. baseline
~1.5 MAE (Hct %, 5-fold CV)
11 dialysis patients
Interactive Analysis
AC/DC amplitude channels are the key unlock. Moving from RoR-only features (MAE 2.3) to the 8 AC/DC channels drops error to ~1.5 — a 35% improvement. The original single train/test split in the notebook gave an optimistic 1.27; 5-fold cross-validation gives a more honest estimate of ~1.5–1.6.
Clinical context: The dialysis target range is Hct 33–36%. An error of ±1.5 pts is meaningful but not disqualifying — for early-warning use (detecting a drop below 30%), directional accuracy matters most. Points are color-coded: green <1 pt error, amber <2.5 pts, red above.
RoR_right dominates at 20.6% importance, consistent with the Schmitt 1992 algorithm. DC baseline channels (median_HB_13_DC at 19.4%, median_HB_3_DC at 12.1%) collectively outweigh AC pulsatile channels — tissue optical properties at baseline encode more hematocrit signal than cardiac-cycle variation.
RoR_right has the strongest individual correlation with true Hct (r = 0.39). The DC amplitude channels follow (median_HB_3_DC r = 0.22, median_HB_13_DC r = 0.21). AC amplitude channels alone are weakly correlated — the baseline optical signal carries more hematocrit information than the pulsatile component.
~1.5 pts Best MAE (5-fold CV)
7.3 pts Baseline linear error
80% Error reduction
10 features IR optical channels