Data

CT Scans

The dataset consists of 40 Computed Tomography Pulmonary Angiograms (CTPA), each of a different patient. The scans were acquired at Unidad Central de Radiodiagnóstico in Madrid, Spain. Several scanners were used. The institutional CTPA protocol was followed at each site. Scans were reviewed using the Osirix software attached to a DELL U2410 or a LG 235V monitor, meeting the requirements for reading clinical scans.

Reference Standard

A reference standard was constructed by three experienced radiologists: A.K. a board certified radiologist with over 10 years of experience reading CTPAs; P.F. head of the radiology unit of one hospital in Madrid with more than 20 years of clinical experience and S.H., thoracic radiologist with more than 19 years of clinical practice, the last 10 specialized on thoracic radiology. The reference standard is formed in the following manner: each radiologist independently marked a region of interest (ROI) in each axial slice of each visible embolus. From such markings, a semi-automated method for emboli segmentation was instantiated. Such segmentation consist of a thresholding step based on Hounsfield units, followed by a closing operation and connected component analysis. Each segmentation is then manually inspected to remove spurious pixels. A hidden reference segmentation was established by consolidating the three segmentations with the method of STAPLE [13]. The consolidated segmentation is manually inspected to detect and remove spurious pixels. Such corrected segmentation is considered our reference standard. Each connected component within a segmentation is considered an individual embolus. This proccedure is exemplified in the following figure:

Data Format

The images can be downloaded in DICOM or NRRD format for convenience of use. The reference standard is given as segmentation masks with 0 value for background and indexes for the clots. The participants are required to provide one text file with the location of each embolus detected in each CT scan. Each location will be accompanied with a confidence score. The participants should also provide a threshold on that confidence score. The location units are expressed in pixels. The origin of coordinates is top-left-first slice. The format of each line should be:

Scan X Y Z Confidence

Data usage and conditions

The use of the data is solely for the purpose of research in computer aided detection of pulmonary embolism. Any other use of the dataset is subject to prior approval of the data owners. The image data belongs to Unidad Central de Radiodiagnóstico, of Comunidad de Madrid, Spain. The reference standard belongs to the M+Visión consortium. If you use this data in your research please be sure to include the following attribution in any publications or grant applications: “The authors acknowledge the Unidad Central de Radiodiagnóstico and the M+Visión consortium for their critical role in the creation of the free publicly available PE database used in this study.”