← Portfolio / Case Study
Machine Learning Deep Learning Python Flask First Class Honours

Breast Cancer —
ML Prediction Web App

A machine learning and deep learning web application that predicts breast cancer diagnosis from clinical data or histopathological images — achieving 97.14% accuracy with SVM and 95.65% with CNN, deployed via a Flask web interface. Awarded First Class Honours.

My Role
ML Engineer & Developer
Context
BSc Software Engineering · 2022
Result
First Class Honours
Code
✦ Available on request
Scroll to explore
MediPredict Web App

What is MediPredict?

MediPredict is a breast cancer prediction web application that uses machine learning and deep learning to help detect the presence or absence of breast cancer at an early stage. The system accepts two types of input: clinical numerical data about the patient, or a histopathological microscopy image of breast tissue — giving users two independent prediction methods within a single application.

Breast cancer remains one of the leading causes of cancer death worldwide, with many cases going undetected until later stages due to diagnostic errors and limited access to specialist expertise. This project explored whether machine learning and deep learning could provide an accurate, accessible, and cost-effective alternative prediction tool — and deployed the best-performing algorithms into a usable web application anyone could interact with.

The project was completed as part of my BSc in Software Engineering at Cardiff Metropolitan University (via ICBT), achieving First Class Honours. The work covered the full pipeline: literature review, data acquisition, algorithm selection, training and evaluation, comparative analysis, system design using UML, web application development with Flask, form validation, and black-box testing.

Quick Facts
Context
BSc Software Engineering · Cardiff Met · 2022
Result
First Class Honours
Best ML Accuracy
97.14% — SVM on WBCD dataset
Best DL Accuracy
95.65% — CNN on BreakHis dataset

Why this matters.

⚠️
The Problem
Traditional breast cancer detection (mammography, ultrasound, clinical examination) depends heavily on specialist expertise and is prone to human error — with approximately 52% of diagnostic errors caused by misreading symptoms and 43% by overlooking them. Early-stage detection rates remain low in many countries, directly impacting survival rates.
The Solution
A dual-method web application combining a trained SVM model (clinical data) and CNN model (tissue images) into a single accessible interface. Users input either their clinical measurements or upload a histopathological image — and the system returns a prediction instantly, with a clear message to consult a doctor before acting on results.

Clinical data and image input — one app.

One of the distinctive features of this project is that it supports two completely independent prediction pathways in a single web application — each using a different dataset, a different class of algorithm, and a different input type.

Method I — Clinical Data
Machine Learning on WBCD Dataset
Users input 9 numerical clinical measurements (clump thickness, cell size uniformity, cell shape uniformity, etc.) from the Wisconsin Breast Cancer Diagnostic dataset. The trained SVM model predicts whether the tumour is malignant or benign. Dataset: 669 cases — 458 benign, 241 malignant. 70/30 train/test split with 10-fold cross validation.
97.14%
SVM accuracy
Best of 4 algorithms tested
Method II — Histopathological Image
Deep Learning on BreakHis Dataset
Users upload a microscopic biopsy image of breast tissue. The trained CNN model classifies the image as benign or malignant. Dataset: 9,109 images from 82 patients at 4 magnification levels (40×, 100×, 200×, 400×). Images resized to 224×224×3. 85% training / 5% testing / 10% validation split.
95.65%
CNN validation accuracy
Best of 3 algorithms tested

Testing four algorithms to find the best.

Four machine learning algorithms were trained and evaluated on the Wisconsin Breast Cancer Diagnostic dataset. All were measured on accuracy, precision, recall, and F1 score for both benign and malignant classes, as well as confusion matrix results.

Algorithm Accuracy Precision (B/M) Recall (B/M) Result
SVM Winner 97.14% 0.98 / 0.96 0.98 / 0.96 204/210 correct
KNN 95.71% 0.96 / 0.95 0.97 / 0.94 2nd place
Naïve Bayes 95.23% 3rd place
CART 90.47% Lowest performer
SVM was selected for deployment. It correctly classified 204 out of 210 test cases — 130 benign correctly identified as benign, 74 malignant correctly identified as malignant, with only 6 misclassifications (3 false positives, 3 false negatives). This result is consistent with other published research showing SVM accuracy between 96.6% and 97.9% on the same dataset.

Testing three deep learning models on image data.

Three deep learning architectures were trained and evaluated on the BreakHis histopathological image dataset. Models were compared on validation accuracy and validation loss — the key metrics for assessing how well a model generalises to unseen data.

Model Validation Accuracy Validation Loss Result
CNN Winner 95.65% 8.68% Best on both metrics
DenseNet121 95.65% 16.78% Equal accuracy, higher loss
ResNet50 94.78% 19.09% Lowest performer
CNN was selected for deployment. While CNN and DenseNet121 achieved equal validation accuracy (95.65%), CNN's validation loss of 8.68% was significantly lower than DenseNet121's 16.78% — indicating CNN generalises better and produces more confident, reliable predictions on unseen images.

MediPredict — the deployed product.

The best-performing models (SVM + CNN) were integrated into a Flask web application called MediPredict. The interface was designed to be approachable and clear for non-technical users — including a disclaimer popup before predictions, validated input forms, and a results section that always reminds users to consult a doctor before acting on the output.

MediPredict Homepage
MediPredict homepage — "Let's Stand Together and Fight Breast Cancer"
Method I — Data Prediction
Method I — clinical data input and prediction result
Method II — Image Prediction
Method II — histopathological image upload and result

A complete user-facing product.

Beyond the prediction tool, MediPredict includes an About Us page explaining the system and its purpose, a Preventions page giving users evidence-based guidance on reducing breast cancer risk, and thorough input validation on all forms.

About Page
About MediPredict
Preventions Page
Preventions & awareness

Built with Python and Flask.

The entire stack is Python-based — scikit-learn and TensorFlow/Keras for model training, Flask to serve the models as a web application, and standard HTML/CSS/JS for the front end. The trained models were serialised and loaded at runtime, with Flask handling the routing, form submission, prediction calls, and response rendering.

🐍
Python
🧠
TensorFlow / Keras
📊
scikit-learn
🌐
Flask
🐼
Pandas / NumPy
📈
Matplotlib

What was achieved.

97.14% SVM accuracy on clinical data 204/210 test cases correctly classified
95.65% CNN validation accuracy on images Lowest validation loss of the three models
1st First Class Honours BSc Software Engineering · Cardiff Metropolitan University

What I learned.

01
Algorithm selection is an empirical process. The literature suggested SVM and CNN would perform well — but implementing all the alternatives and measuring them directly was the only way to confirm this. The comparison process itself produced the most valuable learning, not just the final result.
02
Validation loss matters as much as validation accuracy. CNN and DenseNet121 achieved identical validation accuracy. It was the validation loss that differentiated them — CNN's significantly lower loss meant its predictions were more confident and its generalization more reliable. Accuracy alone is not enough to select a model.
03
ML in healthcare demands responsibility. Designing the disclaimer popup and the "consult your doctor" message wasn't just good UX — it was ethically necessary. A prediction tool in a medical context needs to be transparent about what it is and what it isn't, especially when its outputs might influence someone's decisions about their own health.
Next Project →
MomFlow — Wellness App for New Moms
View Case Study →
← Back to Portfolio
Share: in