Our team was approached to take part in the Biomedical Advanced Research and Development Authority (BARDA) challenge during the Fall 2021 semester at Arizona State University (ASU). This competition was interested in predicting hospitalizations in children under the age of 18 with COVID. The analysis took place in the National COVID Cohort Collaborative (N3C) Enclave, a Spark powered custom platform for analyzing COVID related medical data. We, along with a team of experts with various backgrounds successfully developed a predictive model with overall good performance.
The objective was split into two separate tasks. The first task was to predict which children are most likely to be hospitalized due to COVID. The second task was to predict that of the children who were hospitalized, who was most likely to need ventilation or cardiovascular intervention. Here, we will walk you through the process our team took in order to create our models.
To begin, we first needed to build a basic solution. A template solution was created by Timothy Bergquist which was modeled on demographic information and occurrence of Pediatric Complex Chronic Conditions (PCCC) codes. We replicated this model; however, we were getting erroneous results. The results we were getting was a vector of all 0’s. Now, our overall accuracy was still good, seeing as most of the outcomes were 0’s. This was due to the data being highly imbalanced. Imbalanced data occurs when there is far more of one class than the other. Luckily in this case, far more children weren’t hospitalized due to COVID than those that were. We will discuss how we corrected for this issue in the next section.
As stated, one issue with the data was it was highly imbalanced. Meaning more children were not hospitalized (task 1) vs. those that were and did not need interventions (task 2) vs. those that did. To overcome this issue, our team decided to upsample the smaller case to match the total count of the larger case. Upsampling is the process of randomly selecting and repeating rows in order to balance the dataset. Once this was done, our algorithms were able to return non-zero output, and we were able to move forward as a result.
Now that we had a good framework to build off of, our team then transitioned to identifying potential influential factors in the data. Our team utilized a Miro board to organize our thoughts and to keep track of factors we explored (Figure 1). Subject matter experts such as medical doctors and nurses identified potential risk factors they were aware of and supported their ideas with research. Factors such as obesity, asthma, and other lung related issues were considered and included in the final model. We also identified the usage of certain lung related drugs as significant. We considered other factors such as age, total outpatient visits, whether they were on public vs. private insurance, etc. but these were deemed insignificant in the final model. This was determined by adding one factor at a time and determining its overall effect on the model. In many cases the addition of these factors resulted in much lower model performance. Lastly, there were factors determined significant in the research, however, the data was too inconsistent to support them. These included factors such as blood type, patient weight, and lab tests.
Figure 1. Miro board the team utilized to organize thoughts and track progress.
To assess our model, we set aside 20% of the data to be the test set. From there we used our model trained on 80% of the data to predict the known outcomes on the test set. This enabled us to assess the model’s performance (a commonly used tactic in data science). This data splitting occurred before upsampling, so the test set was highly imbalanced. Initially, the team was using general accuracy to assess the model performance. However, this can be misleading. To overcome this, and as part of the competition, we calculated the F-score. The F-score considers class imbalance in its formulation and provides a metric that is much more reliable in predicting model performance.
In the end we had a model that utilized basic demographics of the patient, their usage of specific drugs, PCCC codes, and obesity diagnosis to predict both hospitalizations and need for intervention while at the hospital. We ended up choosing Gradient Boost for our algorithm after comparing its performance to other algorithms. Furthermore, we did some hyperparameter optimization through a brute-force grid search. The final F-scores were 0.441 for task 1 and 0.310 for task 2. While these aren’t phenomenal F-scores, they are good given the data and problem. This is due to medical data having very high variance from person-to-person, making it difficult to make predictions.
Our team came together and successfully created a set of predictive models that predicted hospitalizations and interventions in children with COVID. This team consisted of people from various backgrounds and skillsets, however, we were able to leverage each person’s strengths in the overall goal. We are still awaiting news on our performance in the competition! We will update all once we hear the news. We were very grateful for the opportunity to participate in the competition and look forward to more like this!