Portfolio | Selected Work
Each project addresses a real healthcare problem. I frame work around the stakeholder need, the analytical approach, and the outcome it enables. All projects are open-source on GitHub.
Recovering Lost NHS Capacity: Missed Appointment Prediction
NHS England loses £1.2 billion annually to missed GP appointments, over 15 million wasted slots per year. Built a predictive system that flags at-risk appointments before they're missed, enabling targeted reminders, rebooking, and overbooking strategies to recover thousands of clinical hours per trust.
From Data Leakage to Honest Insight: ML for NHS Non-Attendance
The first model hit 93% accuracy. I caught the data leakage, rebuilt the pipeline properly, and delivered a model NHS managers could actually trust. The lesson: a good data scientist's value is not the accuracy score, it's knowing when a result is too good to be true.
Who's Missing Appointments and Why: NHS Service Analytics
Before building any model, you need to understand the shape of the problem. This analysis mapped non-attendance by patient group, time slot, and GP practice, giving operations managers a clear picture of where to intervene first.
COVID-19 Severity Prediction for Clinical Triage
During a pandemic surge, clinicians need to know which patients will deteriorate. Built a severity prediction tool using admission-stage clinical features, designed to give overwhelmed A&E teams an evidence-based triage signal when beds are running out.
Automated Tissue Classification for Faster Pathology
Pathology labs face weeks-long backlogs that delay cancer diagnoses. Developed a deep learning classifier that screens histopathology slides for tissue type and flags urgent cases for pathologists first, cutting diagnostic wait times and catching aggressive cancers earlier.
Dementia Prevention: Population Risk Modelling
Dementia costs the UK economy £34.7 billion annually and early intervention is the only scalable lever. Built a risk factor model identifying which modifiable behaviours (physical activity, social isolation, cardiovascular health) carry the strongest predictive signal, providing commissioners an evidence base for targeted prevention.
Does the Treatment Actually Work? Causal Mortality Analysis
Correlation-based dashboards mislead clinical teams every day. Applied propensity scoring and causal inference methods to isolate whether a treatment genuinely reduces mortality. This is the kind of analysis commissioners need before committing millions to an intervention.
Longitudinal Pain Modelling: Predicting Who Gets Worse
Chronic pain services see patients for years without knowing who will deteriorate. Built longitudinal models that track individual pain trajectories and flag patients whose progression predicts escalation, giving clinical teams evidence to intervene earlier and allocate specialist resources effectively.
Supporting Earlier Breast Cancer Detection
Breast cancer survival rates jump from 76% to 98% when caught early. Developed and compared interpretable classification models that flag high-risk cases from diagnostic features, prioritising explainability alongside accuracy because a model clinicians can't understand is a model they won't use.
Diabetes Prevention: Population-Level Analysis
Diabetes costs the NHS £10 billion annually but spending is spread thinly. Mapped prevalence hotspots by demographics, geography, and risk factors, giving commissioners the intelligence to concentrate prevention budgets where they'll prevent the most hospital admissions.
CNN Image Classification: Deep Learning Foundations
Before deploying AI in pathology labs where misclassification costs lives, I built fluency on benchmark datasets. This project established the architectural intuition (layer design, regularisation, hyperparameter discipline) that underpins my clinical imaging work.
Breast Cancer Classification: Method Selection
"Which algorithm is best?" depends entirely on the data and the clinical context. Systematically compared logistic regression, SVM, and KNN on diagnostic data, demonstrating the discipline of matching model complexity to data reality rather than chasing accuracy headlines.
Immune Cell Profiling: Flow Cytometry Analysis
Flow cytometry produces high-dimensional data that's difficult to interpret at the bedside. Built an automated profiling pipeline using PCA and t-SNE to reveal immune cell population patterns, turning raw cytometry readouts into visual immune signatures that support faster clinical decisions on treatment escalation.
All projects are open-source and actively maintained on GitHub.
View Full GitHub →