Istanbul Technical University

Signal Processing for Computational Intelligence Group

ITU-Med Datasets

The visual expression of invasive breast cancer with immunohistochemistry(IHC) allows evaluation of CerbB2 receptors, such that CerbB2 mutated breast carcinomas are suitable for targeted therapy. Breast tumors are evaluated in four different scores as 0, 1, 2, 3 to decide if it is suitable for the CerbB2 protein-specific treatment or not. Pathologists try to decide the scores by eye, which is laborious, and error-prone work with high inter-observer variability. Proposing new image analysis techniques to determine the CerbB2/HER2 scores in breast tissue images in accordance with ASCO/CAP recommendations, automatically, gives some advantages to increase diagnosis speed and accuracy.

In order to evaluate the CerbB2 scoring performance of the proposed technique, two image datasets are utilized. These clinical datasets, obtained from the patients of the medical pathology department in Istanbul Medipol University Hospital at different times. These datasets are named ITU-MED-1 and ITU-MED-2. Patient slides of the first dataset are digitized with a digital microscopy system that consists of Zeiss Axio Scope A1 bright field microscope, 40X objective, 0.63X camera adaptor, Kameram-2 CCD camera (with 1.4-megapixel sensor resolution). The second one is digitized with EasyScan which is Argenit brand Whole Slide Imaging(WSI) system.
Regions of interest are chosen by expert pathologists and captured as mosaic images. Score labels are assigned by pathologists in a patch-based manner by selecting score-representative homogeneous regions. This enables us to employ labels for cell-based analysis, as well. These mosaic images are stitched and blended with Argenit Kameram software to create a whole lesion image.
The ITU-MED-1 dataset includes 13 cases and 191 tissue images The ITU-MED-2 dataset includes 10 cases and 148 tissue images. In the ITU-MED-1 dataset, 41 of them are labeled as ’Score 0’, 42 of them are labeled as ’Score 1’, 52 of them are labeled as ’Score 2’ and 56 of them are labeled as ’Score 3’. In the ITU-MED-2 dataset, 24 of them are labeled as ’Score 0’, 18 of them are labeled as ’Score 1’, 49 of them are labeled as ’Score 2’ and 57 of them are labeled as ’Score 3’. In addition, different from other datasets in the literature, ITU-MED datasets provide both a balanced and an unbalanced sample score distribution among tissue samples.