top of page

Detection of Pulmonary Embolism using an Ensemble Model Based on Neural Networks of EHR Data and CT Scans

Screen Shot 2023-03-28 at 11.32.06 PM.png

Problem

          Pulmonary embolism is extremely dangerous and hard to diagnose. Pulmonary embolism is a condition where The total number of cases is unknown, but estimates say it is around 900,000 yearly in the USA alone.[1] Pulmonary embolism is difficult to identify because it shares many symptoms with other lung conditions. An analysis of 18 studies from China showed that 53.6% of patients in inpatient settings are misdiagnosed, and 37.9% of patients who died in intensive care had pulmonary embolisms that doctors missed[2]. Misdiagnosis of pulmonary embolism is a statistically significant problem, and a solution is urgently needed. After talking to a doctor who diagnoses pulmonary embolisms, I learned that diagnosis of pulmonary embolisms must be quick due to the deadliness of the condition. However, this is not always possible because a doctor must confer with a radiologist to properly diagnose the individual. Furthermore, accuracy in diagnosis is critical due to the lethality of pulmonary embolism and the harm that anticoagulants do to someone who does not have a pulmonary embolism. A rapid and accurate method of diagnosis is critical to helping patients with pulmonary embolisms. This is why I created an ensemble model that utilizes both CT scans and blood test data to improve the accuracy of diagnosis.

 

Background research 

          A pulmonary embolism is a "blood clot that blocks and stops blood flow to an artery in the lung." [3] Pulmonary embolism can be life-threatening. When symptoms are observed, the primary methods for diagnosis are blood tests, CT pulmonary angiography, chest x-rays, ventilation-perfusion scans, and MRIs. Pulmonary embolism is hard to diagnose because it shares symptoms with many other lung conditions.[4] If a patient is diagnosed with pulmonary embolism, they are given blood thinners or clot dissolvers. These medicines stop the blood from clotting but have side effects such as bleeding. 

 

          The three main types of ensembling are bagging, stacking, and boosting. Boosting is where models are assembled sequentially, which did not apply to this project. Stacking involves sending the outputs of both models through a third machine-learning model. The third type is bagging, where the models' predictions are averaged together.[5]

 

Evaluation metrics

          I used the four metrics for this project: accuracy, precision, recall, and the F-score. Accuracy measures the percentage of correct answers the model achieved. Precision measures the quality of a positive prediction. The recall measures the portion of actual positives that the model identified correctly. However, neither does a perfect job assessing the model's overall performance. The F-score fixes this problem because it is the harmonic mean of precision and recall.

​

Engineering goal

          This project aims to improve the accuracy of pulmonary embolism diagnosis by developing an ensemble model that uses CT scans and blood test data to help medical workers in developing areas. Many current frameworks for pulmonary embolism diagnosis use only one piece of data. I first want to create an ensemble model that uses both pieces of data, and I want this ensemble model to outperform the independent models that use only one type of data. The ensemble model comprises of a convolutional neural network that uses CT pulmonary angiograms and a feed-forward neural network that uses the blood test data. My secondary objective is for the ensemble model to surpass the accuracy rate of Chinese hospitals. 

​

​

​

 

 

 

 

RadFusion Dataset 

          The RadFusion dataset has 1837 CT pulmonary angiograms and 1892 instances of blood test data. The data was split into sets of 80% train, 10% validate, and 10% test. Each instance of blood test data had 21 data values: albumin, alk, ast, anion, bilirubin, bun, bun_cre, calcium, creatinine, d-dimer, glucose, hemoglobin, a1c, hgb, INR, lactate, platelet, potassium, PTT, sodium, and WBC. I dropped the columns containing bun_cre, calcium, a1c, hgb, lactate, potassium, and sodium because they were missing most of their values. I filled in the missing values with averages. I used z-score normalization on all the data. I augmented the training data using the Synthetic Minority Oversampling Technique. Each CT scan was a slightly different size, but they were mostly around 400x512x512. I normalized each scan and resized them all to 64x128x128. I then augmented and balanced the data to eliminate bias by rotating each scan by a random degree value between -20 and 20. Each instance in both datasets had an index common to both the CT scan data and the blood test data. I used this index to pair up the respective scan for each piece of blood test data for use in the ensemble model. 

​

Procedure 

  1. The blood test data was prepared by removing nulls and normalization.

  2. The initial FNN was overfitting, so regularization, batch normalization, and dropout were added. The hyperparameters were tuned to optimize accuracy within a reasonable time. 

  3. The accuracy was still subpar, so the data was augmented and balanced using the Synthetic Minority Oversampling Technique. The final FNN achieved a validation accuracy of 0.76.

  4. Each CT scan was normalized and resized. The data was augmented by rotating each scan by a random number between -20 and 20. The scans were then shuffled

  5. The 3D CNN had three repeats of convolutional layers, a max pooling layer, and batch normalization. These three modules were followed by a global average pooling layer and two fully connected layers using ReLu and Sigmoid activation functions. The model achieved an unsatisfactory accuracy, so more positive scans were added to balance the dataset. The model ran for 23 epochs, achieving a validation accuracy of 0.60. 

  6. The index values were used to create corresponding pairs of CT scans and blood test data instances from the test data. These pairs were used to develop two predictions array,  one from each model, which was averaged(bagging) to obtain the ensemble model output.

​

​

​

 

 

 

 

​

​

Results

​

​

​

     

 

 

 

​

​

​

​

​

​

​

​

      The feed-forward neural network performed with an accuracy of 0.76 and precision of 0.75. These values show how the model was primarily accurate, and its positive predictions are good quality. However, it had a lower recall of 0.66, meaning it missed many positives. 

      The ensemble model barely outperformed the feed-forward neural network. It had an accuracy of only 1% higher and an equivalent precision value. The ensemble model only outperformed the FNN significantly in its recall value and F-score. This is mainly due to the low performance of the convolutional neural network. The CNN had a subpar accuracy of 0.55 and a precision of 0.68. The CNN's only improvement over the FNN was its higher recall at 0.71. The CNN used was three-dimensional because the CT scans were not segmented, and I was forced to use the entire three-dimensional CT scan. Three-dimensional CNNs are highly prone to overfitting because of the amount of data contained within each instance. One more problem with using three-dimensional CNNs for this is that a large part of the scan is useless. This large amount of useless data makes diagnosis difficult for the CNN. They are also extremely slow and require a lot of processing power. I ran the model on my for four days but only got through 23 epochs, and the best validation accuracy was 0.60. 

​

​

Conclusion and Applications

          My hypothesis that the ensemble model would outperform the individual models was correct. The ensemble model surpassed the other models in all four metrics except precision, which equaled the FNN. I can conclude that using both CT scans and blood test data improves the accuracy of pulmonary embolism diagnosis. The ensemble model did not drastically improve compared to the FNN, but that is due to the poor performance of the CNN.  

This framework is not meant to replace doctors but rather assist them. 

  1. The model can be used within the CT scanner or blood test laboratory to output its results as soon as the patient is tested. 

  2. The model can be used to identify more urgent patients and prioritize attention or treatment. Especially in developing areas where doctors are scarce, this model can help physicians prioritize care. 

  3. The model can be implemented publicly as an app or website where users can provide their blood test data and CT scan. This can then act as a second opinion for patients.

​

​

Further Research 

Some improvements to this project can be:

  1. Running the models on a more powerful device to enable quicker training time.

  2. Tuning the hyperparameters of both models more.

  3. Obtaining more data by finding it online or segmenting the CT scans

  4. Modifying the CNN architecture

  5. Continue training both models with more epochs for better performance.

​

Screen Shot 2023-03-27 at 11.42.52 PM.png
Screen Shot 2023-03-27 at 11.54.17 PM.png
Screen Shot 2023-03-27 at 11.54.31 PM.png

The FNN without augmentation shows a divergence between validation and train accuracy, while the model with augmentation shows a clear upward trend for both. This dichotomy displays the positive effect of balancing the dataset with augmentation.

Metrics for each type of model.png
bottom of page