Experience

ML projects succeed or fail based on problem clarity, data quality, systems design, iteration speed, and team collaboration. Below are the systems I’ve built and the lessons behind them.

Machine Learning Engineer (Volunteer) | Sprout (04/2025–Present)

Technologies: Python, HuggingFace Transformers, BERTopic, Azure ML

What I Built

Built distributed NLP pipelines in Python using HuggingFace Transformers and BERTopic to process 270GB of text (21M social media posts)
Leveraged Azure ML workflows to scale processing, reducing runtime from 2 weeks to 15 hours (90% improvement)
Designed interpretable topic and emotion outputs tailored for mental health applications across two organizations
Led a 5-person cross-functional Agile team to deliver production-ready NLP systems aligned with stakeholder needs
Developed stakeholder-facing reports and presentations translating model outputs into actionable insights

Impact

Enabled large-scale analysis of climate anxiety across millions of users
Delivered production NLP systems used by mental health organizations
Improved decision-making through interpretable, accessible insights

Progression: Transitioned from building initial NLP models to owning end-to-end ML systems, including infrastructure, scalability, and team leadership.

Key Takeaway: Scaling ML systems requires as much focus on data pipelines and infrastructure as on model performance.

Data Scientist (Volunteer) | Sprout (01/2025–04/2025)

Technologies: Python, LDA (Topic Modeling), Scikit-learn, Grid Search

What I Built

Built an initial LDA-based topic modeling pipeline in Python to analyze 50K social media posts as a baseline for mental health insights
Applied grid search and evaluation metrics to improve topic coherence from 0.344 to 0.432 (~25% improvement)
Extracted and analyzed 75+ themes to support stakeholder understanding of mental health trends
Designed data collection and preprocessing workflows connecting technical methods with community needs

Impact

Established a strong baseline model for downstream NLP system development
Provided early-stage insights guiding mental health intervention strategies
Bridged technical analysis with real-world community applications

Progression: Built foundational NLP models and evaluation workflows that later evolved into large-scale, production-grade systems.

Key Takeaway: Simple, well-evaluated baseline models are critical for guiding more complex system development.

Linear Programming Research Assistant | Purdue University (06/2023–12/2024)

Technologies: Python, Scikit-learn, Linear Programming, SVMs, Convex Optimization

What I Built

Co-developed an optimization-based classifier in Python combining linear programming and SVMs, evaluated on 100K data points
Implemented and benchmarked scikit-learn classifiers to study separability and decision boundaries in high-dimensional spaces
Applied convex optimization techniques to improve interpretability in structured classification tasks
Co-authored two forthcoming research papers and presented findings to 40+ faculty and students

Impact

Advanced research in interpretable, optimization-based classification methods
Contributed to academic publications and departmental knowledge sharing
Strengthened understanding of model behavior in high-dimensional settings

Progression: Developed strong theoretical and mathematical foundations in optimization and classification, later applied to real-world ML systems.

Key Takeaway: Understanding the mathematical structure of models leads to better interpretability and more reliable systems.

Planetary Climate Dynamics Research Assistant | Purdue University (05/2022–12/2022)

Technologies: Python, NumPy, Pandas, Data Visualization, Atmospheric Data Analysis

What I Built

Processed and analyzed 90GB of planetary reanalysis datasets (EMARS, MACDA) in Python using NumPy and Pandas
Developed data processing pipelines and visualizations to study dust storm track variability
Supported research reporting through clear visual and analytical summaries of atmospheric patterns
Presented findings to the research team and contributed an abstract to the AGU conference

Impact

Enabled analysis of large-scale atmospheric datasets for planetary climate research
Contributed to scientific communication through presentations and conference submission
Supported ongoing research into atmospheric dynamics and variability

Progression: Gained early experience working with large-scale scientific datasets and building data workflows, forming the foundation for later ML system development.

Key Takeaway: Strong data processing and analysis foundations are essential before building reliable machine learning systems.

Looking for someone with ML systems experience? Let’s talk!