Chapter 6: Standard Errors and p-values
In order to make a case for causality it is necessary to make a case for correlation. The correlation must not be spurious, meaning it could not have occurred by chance alone. Statistics has a formal framework called Null Hypothesis Significance Testing (NHST) that helps determine whether or not a correlation is spurious. In order to use this framework researchers assume the null hypothesis. This assumption generally means that there is no correlation between two variables (r = 0 meaning correlation=0)[a]. For this text, remember that the null hypothesis means that there is no correlation between two variables.
In some cases it is quite obvious whether the null hypothesis is true or not. Here is an example: a researcher wants to know something very narrow, say, what is the correlation between height and weight in a particular high school class. To answer the question a researcher would record every student's height and weight for that classroom in SPSS. Then they would run the bivariate correlation procedure as you have learned to do in chapter 4 and examine it. In our example, SPSS calculated the bivariate correlation to be quite small, say r = .1. [b]Because .1 does not equal 0, the null hypothesis is false, meaning there is a clear though small correlation between height and weight in this classroom. The vast majority of research questions are not so narrow. Usually research tries to take a sample of subjects and make inference [c]to a larger population. Take our above classroom study, to infer the correlation between height and weight in the general population, researchers would use the same process above. Assuming the sample classroom of students is representative of the larger high school population, researchers would calculate r =.1 and claim that the null hypothesis is false because .1 does not equal 0.
The example researcher would have a problem though, because r is small, they could be open to criticism by those who could say that my classroom correlation may just be a fluke. The correlation between height and weight in the general population may indeed be r = 0, and by chance alone this particular classroom happens to have a correlation of r =.1. To contradict this criticism, it would be great to know how often this sort of fluke could happen. How often, if the correlation really is 0, would a random sample like the one in our example produce r =.1 or larger? That question is exactly what a p-value answers. Please memorize this, a p-value is the probability, by chance alone, of getting results as extreme or more extreme than your results, assuming the null hypothesis.
Let’s substitute the example’s facts into this definition. This example’s p-value is: the probability of getting a correlation of height and weight of r =.1 or larger, given that the true population correlation is r = 0.
Here is another example of a p-value. Figures 4.13 and Figure 5.1 both show the sample correlation of Age in years and Number of Books read in the Same Year. [d][e]
Figure 6.1. Correlation table for age in years vs. number of books read in the past year.
The Pearson Correlation is .198, which is small, but could this small value have happened by chance alone given that the null hypothesis r = 0 is true? Notice the row below the .198 estimate. The label of this row is “Sig (2-tailed)”, which is short for statistical significance (2-tailed). The value in this row is .403. This value is the p-value for this hypothesis question. It is interpreted as: The probability of getting a correlation of .198 or larger (more extreme) if the null hypothesis of the population correlation of age and books is 0 is 40.3%. Generally, all p-values that are greater than 5% are insufficient evidence to reject the null hypothesis. Critics of this correlation can rightly say that with a p-value of over 5% this study is insufficient evidence to say that the true population correlation is anything but 0. You would fail to reject the null hypothesis. It is common to find that correlations could be the result of chance alone. [f][g]
Standard error is a measure of the preciseness of your estimate and is inextricably linked to the p-value. In fact the first step in calculating a p-value involves the standard error. Standard errors are a function of the standard deviation of the distribution of your variable. A standard deviation unit, as mentioned in chapter 2, is a measure of the average distance from the mean in a distribution. The general form of the standard error is
Standard error = .
Where n is the sample size of your data.
Step one to get a p-value is to calculate what is called a test statistic. A test statistic is a transformation of the raw statistic (in the example above the raw statistic would r = .198) into standard deviation units. Here is the general form
test statistic[i] = [j]
Once this transformation is finished, the computer translates the standard deviation unit into the probability of the tail of the null hypothesis distribution. In other words to show how likely your statistic would have occurred given the null hypothesis is true.
Figure 6.2. Illustration of a p-value in a distribution.[k]
That is just one use of the standard errors. P-values have come under attack as being too simplistic to summarize a statistic. The p-value is simplistic in a way. It just answers one question about the statistic: could this value have happened by chance alone if the null hypothesis is true? In contrast, the standard error can be used to create a plausible range of values called[l] a confidence interval for the parameter from the test statistic. Here is the general form of the confidence interval
confidence interval = statistic ± (le[m]vel of confidence)*standard error
In the case of age vs. books read the statistic[n] is .198. The level of confidence is a value chosen by the researcher indicating how confident they want to be in the interval. Typically a “95% confidence” level is chosen which translates into plugging in the number 2 for level of confidence. If the standard error for books vs.[o] statistic is .4, then [p]the confidence interval is as follows .198 + 2*.4, which results in .998. Then .198 - 2*.4 and get approximately -.6. Thus, the confidence interval would be reported as (-.6, .998). That means, researchers are 95% confident that the true population correlation of age vs. books read lies between -.6 and .998. In other words, there is no clear picture of what the true population correlation is as it almost spans the entire correlation range. This shows (a) that our estimate is very imprecise and (b) As 0 is a plausible value for our parameter (0 is in the confidence interval) the null hypothesis can’t be rejected (p > .05)[q]. In an alternative universe where the standard error for age vs. books read is much smaller researchers could end up with a confidence interval of say (.09, .11). This would be interpreted the same way: we are 95% confident that the true population parameter lies between .09, and .11. In this universe, we would be quite sure of our answer, and we could easily reject the null hypothesis. That is because, 0 is not in the interval (p < .05). A critic could attack this alternative universe result by saying that even though the correlation is not 0, and researchers are very sure of what the true value is, it is still too small to be relevant. The critic would have a strong case that would need to be discussed subjectively. For example, the researcher could show correlations between income vs.[r] books read per year or level of education vs. books read per year. If those correlations of what are considered in the literature to be important variables in the life of a student are even smaller than the small correlation of (.09, .11) between age vs. books read is still worth looking at. And the debate would continue. Nevertheless, p-values and standard errors are useful tools in making an empirical case that a statistic is “important”.
Put examples of p-values, ask the student to interpret. Answer key
- How would you interpret the p-value f[s]or the correlation between depression score on the DASS-21 and anxiety score on the DASS-21?
- How would you interpret the p-value for the correlation between hours spent on social media per week and depression score on the DASS-21?
- How would you interpret the p-value for the correlation between weight and hours spent on social media per week?
Put examples of confidence intervals, ask student to interpret. Answer key
- Calculate the 95% confidence interval for the correlation between depression score on the DASS-21 and anxiety score on the DASS-21. Interpret your results. The standard error is 0.48.[t][u]
- Calculate the 95% confidence interval for the correlation between hours spent on social media per week and depression score on the DASS-21[v]. Interpret your results. The standard error is 0.29.[w]
- Calculate the 95% confidence interval for the correlation between weight and hours spent on social media per week. Interpret your results. The standard error is 0.69[x].
- The probability of getting a correlation of .901 or larger (more extreme) if the null hypothesis of the population correlation of depression score and anxiety score is 0 is 0.1%. That’s smaller than .5 so it means it’s significant.[y]
- The probability of getting a correlation of .551 or larger (more extreme) if the null hypothesis of the population correlation of hours spent on social media per week and depression score is 0 is 1.2%.
- The probability of getting a correlation of -0.294 or larger (more extreme) if the null hypothesis of the population correlation of weight and hours spent on social media per week is 0 is 20.9%.
- Lower band: (0.901-2.00*0.48) = -0.059
Upper band: (0.901+2.00*0.48) = 1.861
We are 95% confident that the true population correlation of depression score vs. anxiety score lies between -0.059 and 1.861. [z][aa]
- Lower band: (0.551-2.00*0.29) = -0.029
Upper band: (0.551+2.00*0.29) = 1.131
We are 95% confident that the true population correlation of hours spent on social media per week vs. depression score lies between -0.029 and 1.131.
- Lower band: (-0.294-2.00*0.69) = -1.674
Upper band: (-0.294+2.00*0.69) = 1.086
We are 95% confident that the true population correlation of weight vs. hours spent on social media per week lies between -1.674 and 1.086.
[a]do we know that r (italicized) means correlation?
[b]this example would be nice with pictures.
[d]I added a transition. What do you think?
[e]Should we do hyperlinks to skip back to these tables? They are a few chapters away. Or maybe just put them in here again so we can look at them.
[f]I don't know if you need this sentence, but you did say this happens often and I didn't like the phrase right there so I redid the paragraph.
[g]I would take it out, unless its important to have them keep in mind their research will often find nothing with a good p-value.
[h]Pu n standard error link
[i]Insert section of sampling distributions.
[j]When you dive into these formulas I get very nervous and and struggling not to gloss over them. I don't really understand.
[k]please label the p-value and put and arrow if necessary. I see not p-value in the above histogram
[m]Is this correct? I added the confidence interval to the equation.
[n]raw statistic? test statistic? correlation?
[o]are we missing a word here? can you have a standard error on the values you entered for books read?
[p]Where does this number come from? is it the 40% p-value? A picture would be helpful here to show where the numbers come from.
[r]is this supposed to have a period? If so this should be consistent throughout.
[s]Am I supposed to create a confidence interval? Am I supposed to say this is p <.05 so we reject the null-hypothesis and the correlation is unlikely to be caused by chance alone?
[t]confidence interval = .901 + 2*.48, .901-2*.48
0 is not part of the series so the null hypothesis is rejected. It may be that the correlation is small however. We need to address this by looking at subjective literature of correlations that are well accepted.
[u]Say which table this refers to.
[v]again say the table name, or show the table again might be even better.
The null hypothesis can't be rejected, because 0 is part of this series. That means it's impossible to tell from our study if there is a non-spurious correlation between social media use and depression.
That's a really big range so we say, we are 95% confident that the correlation between weight and social media us is some number. Also, we can't rule out the null-hypothesis so that nebulous correlation might be caused by chance alone.
[y]Do you want them to remember the relation between this and the p<.5 thing?
[z]Should we mention the null hypothesis?
[aa]Also do you want them to comment on how useful this information is likely to be? Like I did in my comments, which I see were the wrong answers... haha