Experience
ML projects succeed or fail based on problem clarity, data quality, systems design, iteration speed, and team collaboration. Below are the systems I’ve built and the lessons behind them.
Machine Learning Engineer (Volunteer) | Sprout (04/2025–Present)
Technologies: Python, HuggingFace Transformers, BERTopic, Azure ML
What I Built
- Built distributed NLP pipelines in Python using HuggingFace Transformers and BERTopic to process 270GB of text (21M social media posts)
- Leveraged Azure ML workflows to scale processing, reducing runtime from 2 weeks to 15 hours (90% improvement)
- Designed interpretable topic and emotion outputs tailored for mental health applications across two organizations
- Led a 5-person cross-functional Agile team to deliver production-ready NLP systems aligned with stakeholder needs
- Developed stakeholder-facing reports and presentations translating model outputs into actionable insights
Impact
- Enabled large-scale analysis of climate anxiety across millions of users
- Delivered production NLP systems used by mental health organizations
- Improved decision-making through interpretable, accessible insights
Progression: Transitioned from building initial NLP models to owning end-to-end ML systems, including infrastructure, scalability, and team leadership.
Key Takeaway: Scaling ML systems requires as much focus on data pipelines and infrastructure as on model performance.
Data Scientist (Volunteer) | Sprout (01/2025–04/2025)
Technologies: Python, LDA (Topic Modeling), Scikit-learn, Grid Search
What I Built
- Built an initial LDA-based topic modeling pipeline in Python to analyze 50K social media posts as a baseline for mental health insights
- Applied grid search and evaluation metrics to improve topic coherence from 0.344 to 0.432 (~25% improvement)
- Extracted and analyzed 75+ themes to support stakeholder understanding of mental health trends
- Designed data collection and preprocessing workflows connecting technical methods with community needs
Impact
- Established a strong baseline model for downstream NLP system development
- Provided early-stage insights guiding mental health intervention strategies
- Bridged technical analysis with real-world community applications
Progression: Built foundational NLP models and evaluation workflows that later evolved into large-scale, production-grade systems.
Key Takeaway: Simple, well-evaluated baseline models are critical for guiding more complex system development.
Linear Programming Research Assistant | Purdue University (06/2023–12/2024)
Technologies: Python, Scikit-learn, Linear Programming, SVMs, Convex Optimization
What I Built
- Co-developed an optimization-based classifier in Python combining linear programming and SVMs, evaluated on 100K data points
- Implemented and benchmarked scikit-learn classifiers to study separability and decision boundaries in high-dimensional spaces
- Applied convex optimization techniques to improve interpretability in structured classification tasks
- Co-authored two forthcoming research papers and presented findings to 40+ faculty and students
Impact
- Advanced research in interpretable, optimization-based classification methods
- Contributed to academic publications and departmental knowledge sharing
- Strengthened understanding of model behavior in high-dimensional settings
Progression: Developed strong theoretical and mathematical foundations in optimization and classification, later applied to real-world ML systems.
Key Takeaway: Understanding the mathematical structure of models leads to better interpretability and more reliable systems.
Planetary Climate Dynamics Research Assistant | Purdue University (05/2022–12/2022)
Technologies: Python, NumPy, Pandas, Data Visualization, Atmospheric Data Analysis
What I Built
- Processed and analyzed 90GB of planetary reanalysis datasets (EMARS, MACDA) in Python using NumPy and Pandas
- Developed data processing pipelines and visualizations to study dust storm track variability
- Supported research reporting through clear visual and analytical summaries of atmospheric patterns
- Presented findings to the research team and contributed an abstract to the AGU conference
Impact
- Enabled analysis of large-scale atmospheric datasets for planetary climate research
- Contributed to scientific communication through presentations and conference submission
- Supported ongoing research into atmospheric dynamics and variability
Progression: Gained early experience working with large-scale scientific datasets and building data workflows, forming the foundation for later ML system development.
Key Takeaway: Strong data processing and analysis foundations are essential before building reliable machine learning systems.
Looking for someone with ML systems experience? Let’s talk!