Chapter 5

# Directionality and Causality

Chapter 5: Directionality and Causality

Last chapter we introduced structural equation modeling diagrams (SEM) where we showed you how to draw a picture showing the correlation between depression and anxiety (see figure 4.16). In order to draw more complicated diagrams we need to discuss directionality and causality. Notice in Figure 4.16 that the curved line has an arrowhead pointing to both depression and anxiety.[a][b] This type of model that involves just a correlation makes no causal or directional claims between the two variables. It could be that depression causes anxiety, or anxiety causes depression, both of those would be causal or directional claims. Causal and directional claims in this context are synonyms. It could also be that depression and anxiety both do not cause the other and there is a third variable that is not included in the model that causes both. In other words, the model represented in Figure 4.16 makes no causal or directional claims about any of the variables involved. In order to make causal claims we introduce a new diagramming technique and discuss the theoretical justification that needs to be included to justify causal or directional claims..

#### Drawing Causality or Directionality

Figure 5.1 shows a causal relationship between x and y.

Figure 5.1. An SEM diagram showing a causal or directional relationship between x and y. The strength of the relationship is shown by β.

This model, unlike the model shown in Figure 4.16, makes a strong causal or directional claim from x to y. If you want to change y you can directly influence it by changing x. Note, the only important aspect of this diagram is which way the arrow points. The position of the box that contains x being on the left and the box that contains y being on the right is arbitrary and could be switched with no change of meaning. [d]

A real world example would be hours of reading a day (x) affects a child’s reading fluency (y) as measured by a standardized test as shown in Figure 5.2.

Figure 5.2. An SEM diagram showing a causal relationship between hours reading and reading fluency score.

#### Justifying Causality or Directionality

In order to justify the validity of a model like the one shown in Figure 5.2, it needs to be theoretical backing. The study of causality or directionality has occupied both philosophers and data analysts for centuries, and there is broad literature covering the topic. Nevertheless, the main ideas of causality or directionality we need consider can be summarized into these three conditions:

1. A correlation must exist between x and y.
2. There must be temporal precedence of the change in x before the change in y.[g][h]
3. The relationship is non-spurious (meaning it cannot be explained any other way).
##### Condition 1: A correlation must exist between x and y.

In order for a researcher to argue that x causes a change in y there must be some kind of association between the two that can be measured. In chapter 4, we introduced the idea of the Pearson correlation coefficient[i] as a good measure of linear association between two variables. So, assuming the relationship is linear and has no curves, it is easy to check condition 1 (A correlation must exist between x and y). You do this by running the bivariate correlation procedure in SPSS as you did in chapter 4. If the Pearson correlation coefficient is important (meaning both statistically significant[j] and big enough to be meaningful) then condition 1 is satisfied. In Figure 4.14 and Figure 4.15 we do just that, we calculate the correlation between depression and anxiety. That correlation is statistically significant (p < .001), and is quite large (r = .901) and thus can be considered important.

##### Condition 2: There must be temporal precedence of the change in x before the change in y.

In order for directionality to be justified the change in x must come before the change in y. That is temporal precedence, when one event happens before another. [k]In order for the model shown in Figure 5.2 (increased hours reading results in increased reading fluency score) to be valid the researcher cannot have measured hours reading after they measure reading fluency score. You cannot say that a change in hours reading in the future has affected reading fluency score in the past. This can pose a challenge in real world research as many times data is collected on all variables simultaneously. For example, the data collected to create Figure 4.16 is probably taken from a survey where the researcher asked questions about both the subjects’ depression and anxiety. [l]Thus, there is no way from the data to determine temporal precedence, instead the researcher must do this with logic and theoretical considerations.

Sometimes the relationships are bidirectional, that is, they both go from x to y and from y to x simultaneously . For example, for the model shown in Figure 5.2 (hours reading causes change in reading fluency score) it could be argued that additionally to the argument that x changes y, that y also changes x. Increasing reading fluency score could also result in a change in hours reading because a child with a higher mastery of reading as measured by their reading fluency score may read more hours because they enjoy it more. Longitudinal studies where x is gathered before y, especially across several time points, can help isolate temporal precedence. This can create an argument that is not only theoretical justified but also justified by empirical evidence. [m]

##### Condition 3: The relationship is non-spurious.

In order for the argument to be made that x causes y, there cannot be any missing variables or alternative reasons why such a relationship exists. Let's take for example the model shown in Figure 5.3.

Figure 5.3. An SEM diagram showing a causal relationship between shoe[o] size of a child and reading fluency score.

Figure 5.3 can be interpreted as for every 1 unit increase in shoe size, a predicted child’s reading fluency score increases by 3.5 points. This relationship, on its face, seems ridiculous or, in more technical terms, lacks face validity. A researcher could argue that this relationship satisfies condition 1: A correlation must exist between x and y; and also condition 2: There must be temporal precedence of the change in x before the change in y. They are correct. If you run a bivariate correlation between shoe size and reading fluency score you will indeed get a statistically significant correlation. It will also be a large correlation and therefore it can be judged as a correlation that is important. [p]The researcher advocating for the causality of the model can also argue that shoe size was measured before reading fluency score therefore satisfying condition 2. They are again correct. Nevertheless, critics of the model can rightly point out that there is a missing variable in the model that affects both shoe size and reading fluency score, namely age, as can be seen in Figure 5.4.

Figure 5.4. SEM diagram showing two causal relationships from age of child to both shoe size and reading fluency score.

Figure 5.4 is interpreted as: (1) for every one year increase in age a predicted child’s shoe size increases by 0.5 units, (2) for every one year increase in age controlling for shoe size the predicted reading fluency score for a child increases by 3.4 units, and (3) for every one unit increase in shoe size controlling for age the predicted reading fluency score increases by .1 units. Note that the directional arrow from shoe size to reading fluency score is dashed rather than being solid. This is frequently done in SEM diagrams to show that the relationship between x and y is statistically nonsignificant or could have easily have happened by chance alone. Also note, that the interpretation of the diagram changes if one of the squares has more than one arrow pointing to it. In this case reading fluency score has arrows from both shoe size and age pointing to it. Thus, the model takes into account that there are two possible causes of change in reading fluency scores. When we are taking into account two possible causes for the change, that is called controlling for.

Now, with this revised model shown in figure 5.4, critics can argue that the relationship between shoe size and reading fluency score is indeed not causal as there is no statistically significant association between shoe size and reading fluency score once the age of the child is accounted for. The previous model shown in Figure 5.3 would therefore be considered spurious. That model is actually very wrong, or in technical terms the model is misspecified. Age must be included in order for the model to be defensible. The real reason there seems to be a relationship between shoe size and reading fluency score is because students who have larger shoe sizes are older, and older students have higher reading fluency score. Age, in this context, would be called a confounding variable. A confounding variable affects both x and y and, in its presence, the relationship between x and y changes.

#### Exercises

Include several SEM diagrams of models and ask the students to evaluate the causal claims. Also create an answer key.

1. Describe the causal relationship between points scored and years playing a sport.
1. Describe the causal relationships from number of friends to both extracurricular activities and birthday presents received.

1. Describe the causal relationship between time spent on social media and GPA.

1. For every one unit increase in points scored, the predicted amount of years playing a sport increases by 0.6 years.
2. a) For every one unit increase in number of friends the predicted number of extracurricular activities increases by 0.8 units.

b) For every one unit increase in number of friends controlling for extracurricular activities the predicted number of birthday presents increases by 3.1 units.

c) For every one unit increase in extracurricular activities controlling for number of friends the predicted number of birthday presents received increases by 0.3 units.

1. For every one unit increase in time spent on social media, predicted GPA decreases by 0.2 points.

[a]Right before this sentence, I would start a new paragraph and define directionality and causality, then go into explaining Figure 4.16

[b]Or we could do the yellow box thing to the side with the definition.

[c]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.

[d]Insert the backward image as an example.

[e]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.

[f]Include a sidebox where a intuitive explanation of what a beta is (Beta is a slope, or a rate fo change, for every 1 unit change in Y.

[g]Abby, please make sure that all mentions of variables x and y are lower case and italicized.

[h]Done

[i]Abby, In chapter 4 can you fix all references to correlation so that they say, Pearson correlation coefficient.

[j]introduce this concept possibly in its own chapter. Chronologically before this chapter.

[k]I think this definition is helpful what do you think?

[l]These variables need to be italicized whenever they are mentioned.

[m]Put in a side box.

[n]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.

[o]I think this should be shoe size, not shoes size. Minor thing...

[p]Do we want to say Pearson correlation?

[q]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.

[r]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.

[s]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.

[t]Fix the italicized B to the greek letter Beta in the final product. And get rid of the yellow background.