Statistical Causality

Share This:

Causality remains a misunderstood construct that can be cumbersome even to seasoned researchers, not to mention persons operating in non-scientific fields. Though used extensively in areas associated with advanced econometrics, perhaps the concept is most misunderstood in the medical community, where the term is grossly misused. For example, I have been dealing with some health issues in recent months, and in so doing, I have witnessed misuse of the word “cause” on countless occasions; and when cause seems to be a bit of a stretch, the word association is used. So, let’s digress from our normal topic of engineering economic systems and use this recent health example as a means of discussing causality; the discussion is applicable across all fields and disciplines.

Recently, I have been told repeatedly that sleep apnea causes congestive heart failure. When that mandate is challenged, it is changed to, “There is an association between sleep apnea and congestive heart failure.” So, what is being implied? Well, causation is being implied, but in fact, I personally have found no evidence of causation between sleep apnea and congestive heart failure, though it may exist. In fact, after sifting through the medical literature on numerous occasions, I have found no evidence that there exists even a statistical relationship between sleep apnea and congestive heart failure; but if I did, as in most medical literature, that relationship would not be defined with an actual correlation coefficient and accommodating level of significance, as is customary—and expected—in scientific research. Even then, and as learned by elementary doctoral students, the most fundamental realization regarding association, or relationship, is that correlation does not substantiate causation. Even if a relationship does exist, that relationship may be spurious, latent, or the like, where in the end, scientific validity via research design and methodology should be heavily scrutinized before accepting such a relationship.

To stick with our example regarding sleep apnea causing congestive heart failure, the assumption makes sense from a medical perspective—and even from a lay perspective. When organs are deprived of oxygen, additional stresses are placed on the heart muscle to provide additional oxygen—in turn, weakening the heart over time. But when we are investigating a relationship such as this, we must substantiate that relationship prior to acting upon it. In lay terms, that relationship must be substantiated through robust methodologies and analyses. First, that relationship must be established through correlation—in this case, likely through the Pearson Product Moment method. Through such, we can determine whether a relationship exists, as well as whether that relationship is direct or inverse.

Along with calculating the correlation coefficient, we must determine whether that relationship is statistically significant. In so doing, an accepted method for determining significance is to use probability—in this case, likely using p-values. Probability simply requires us to determine how likely or unlikely an observation of a more extreme statistic in the direction of an alternative hypothesis than the observed occurs. In the end, if p<?, reject the null hypothesis in favor of the alternative hypothesis. Conversely, if p>?, do not reject the null hypothesis. Assuming a coefficient exists and is significant, usually at the .01, .05 or .10 alpha levels, statistical effect then must be estimated, meaning we then must determine whether sleep apnea has a statistically significant effect on congestive heart failure. Again, however, note that we are not merely throwing out words to sound intelligent; we are quantitatively substantiating whether there is not only a statistically significant relationship between sleep apnea and congestive heart failure, but whether sleep apnea has a statistical effect on congestive heart failure.

There exist several methods to estimate effect, one of the easiest of which is to calculate the coefficient of determination, R2, or in some cases, R2adjusted. The coefficient of determination depicts the variance explained in the dependent variable by the independent variable(s); and R2adjusted is merely a modified form of R2 based upon the number of independent variables in the model. A more concrete method for estimating statistical effect requires us to conduct an F-test via ANOVA, a method that plays into sampling, power analysis, and—another overwhelmingly misunderstood area in the medical literature—meta analysis.

Read article…

Share This:

Share This: