The perils and pitfalls of Observational studies: ‘Where do we stand with Propensity Matching?’

In the current age of 24-hour news channels, social media and information overload, we are often surrounded by a whirlwind of health-related information alerting us to new studies and exciting new research.

‘Coffee drinking is linked to a longer life!’ Reality or spurious correlation?

Association is not the same as causation. The fact that a particular treatment or intervention is associated with an outcome does not mean that the treatment or intervention caused the outcome. This is one of the major issues with outcomes-based analysis in observational studies. There is confounding i.e. some other covariate is associated with both the risk factor or intervention and the outcome. If you know what the covariate is, you can 'control' or 'adjust' for it using regression analysis. Additionally, there is selection bias -- the bane of observational studies. 

This is why randomized controlled trials (RCTs) provide much stronger evidence for a causal relationship between intervention and outcome. The word controlled is bolded because a well-designed RCT ensures a balance of both measured and unmeasured confounders that may spuriously link an exposure to an outcome. But an RCT is not always a feasible option and much of what we do in everyday clinical practice comes from observational studies.

Confounding and selection bias threaten the validity of observational studies such as ‘Chocolate consumption leads to fewer deaths from cardiovascular disease.’ Propensity Score (PS) matched analysis is a tool to control for confounding and selection bias.

The propensity score is the estimated probability of receiving the intervention of interest depending on pre-intervention characteristics of the study participants. It allows us to take many different variables and condense them into a single variable that gives the probability that the intervention will occur in each individual. Thus we can find people or groups with similar propensity scores, and use this score to make the groups being compared more similar.  By matching people with similar scores we can see if the results still show that the intervention is associated with the outcome of interest. In this way, PS looks like an attractive tool to simulate a RCT in which all the covariates end up being the same except the treatment.

But PS analysis is not magic! In an editorial titled Propensity Scores: Help or Hype? Dr. Winkelmayer highlights some of the main limitations with PS.

PS can balance the observed baseline characteristics but it cannot balance the unmeasured confounders and characteristics. Also, one cannot use the variables that can be affected by the intervention itself in a model that estimates PS. Another important question is, ‘Does propensity score matching offer you much more than traditional multivariable adjustment?’ Many believe that in most studies, PS has no apparent advantage over traditional methods because as long as the the sample size is sufficiently large, the results are usually the same whether PS is used or not.  It can be a useful tool in a study with many confounders but it is unable to entirely eliminate confounding.

Watch this excellent video by Perry Wilson on ‘Are Propensity Scores As Good As Randomized Trials?’

Commentary by Manasi Bapat, Nephrology Fellow New York,

NSMC Intern, Class of 2018