Last week I had a chance to learn some fascinating techniques in a data mining workshop. We were looking at a study on breast cancer diagnosis with fine needle aspirate slides. The technique digitizes images of cell nuclei. A computer then processes the shape of the nuclei, recording multiple geometric aspects.

Once this data has been collected, a learning algorithm attempts to locate a plane that separates the data into malignant or benign classifications. If the data isn’t fully linearly separable, the algorithm returns a plane that minimizes the average distance of misclassified points to the hyperplane. It wasn’t clear to me whether or not they used a multi-layer perceptron (a form of artificial neural network) or some other technique.

Whatever the technique, it is quite effective with this problem: the study cited accuracy of about 97% with ten-fold cross-validation. Within just twenty minutes or so, we were able to reproduce models of similar accuracy in the computer lab. To me, there is something unspeakably beautiful about finding the geometry of a cell — its contours, symmetry and fractal coastlines. Teaching a machine to sort between cells in order to find cancer is even more amazing. There is something affirming about this particular research. Studying life-saving science is motivational, to say the least.

For anyone interested, the paper was ‘Nuclear Feature Extraction for Breast Tumor Diagnosis’ by W. Street, W. Wolberg, and O. Mangasarian. It was in the 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870.