Dragonnet model for causal analysis of real-world data
Over the past few years, the scientific community has been heavily investing into causal inference to draw causal conclusions based on real-world data (RWD) , with a particular focus on electronic health records (EHRs), biobank data, registries, and wearable devices. In the REBECCA project, researchers are seeking to unlock the potential of RWD to explore and quantify cause-and-effect relationships between various aspects of breast cancer patients’ quality of life and the eventual comorbidities developed following the treatment. Of course, this type of data is not without any limitations. To address this issue, researchers leverage the power of machine learning models to estimate treatment effects in the presence of sampling bias, information bias, and confounders.
The fundamental problem of causal inference is that only the factual outcome, i.e the treatment/outcome data combination, is available for the subjects examined. The counterfactual outcome, i.e what would have happened if a different treatment had been chosen while keeping everything else constant, is not directly observable. Therefore, we need to estimate the counterfactual outcomes to attain causal effect estimation.
For this estimation, we propose to apply the NN-Dragonnet model, a neural network-based model, whose principle is to integrate information from the nearest neighbouring samples in order to provide accurate estimation of causal effects.
What makes the NN-Dragonnet unique is its innovative use of inputs. In addition to covariates, this model also incorporates valuable information from neighbouring outcomes of patients both in the control and the treatment groups found in the training dataset. The causal effects we are interested to calculate in REBECCA are 𝐴𝑇𝐸, CATE and ITE, concerning the causal effects for the average of the population. Our approach complements a representation-learning network with statistics derived from neighboring) to conveys information that can help regression models to reduce the error in the estimation of the treatment effect for individual patients. The high-level architecture of the NN-Dragonnet model is presented in the following figure, where and the nearest neighboring instances from the control and treatment groups, which are denoted as, respectively. The outputs of the model are the predictions of the conditional outcomes for treatment group as well as the prediction of the propensity score 𝑔̂(𝒙𝑖;𝜃), where 𝜃 is the vector with the parameters of the network.
The overarching goal of the REBECCA project is to gain insight into the effectiveness of potential interventions in improving quality of life for breast cancer patients. Specifically, researchers are looking at quality-of-life aspects that impact patients’ daily lives, such as social and work-life participation. In addition to examining individual cases, the project also seeks to identify patterns within the breast cancer patient population. By doing so, researchers can gain a better understanding of the disease and its impacts, which can ultimately inform the development of more effective treatments and interventions.