Chapter 2

In this chapter you will learn how to:

1. input and read data into SPSS,
2. have SPSS produce measures of the center of a dataset (e.g., mean, median),
3. and have SPSS produce a variety of spread estimates (e.g., range, standard deviation, variance, max, min).

[a]

## Inputting the data into SPSS

In a hypothetical classroom of 3 students you decide to study their test scores. One way is to create a table in Excel or even on paper as shown below in figure 2.1.

Figure 2.1

Hypothetical test scores from a classroom of 3 students.

NameScores
Mike76
Amy85
Jane98

Start entering this data into SPSS by opening the program. Start a new project by clicking File→New→Data. [b]

Figure 2.2

Graphic User interface (GUI) for starting a new project in SPSS.

Then select “variable view” from the tabs at the bottom of the screen.

Figure 2.3

Graphic User Interface (GUI) for the variable view in SPSS.

The first column is called “name”, enter the names of the two columns seen in Figure 2.1 in the first two rows. These columns are called “variables” in statistics. Click in the 2nd column which is called “type” in the first row that has the word “numeric” in it.

Figure 2.4

Graphic User Interface (GUI) for variable type in SPSS.

A box with 3 dots will appear. Click on it and select the ‘string’ option from the menu.

Figure 2.5

Graphic User Interface (GUI) for variable type in SPSS.

This allows you to put the names in that column that use letters. Now the program knows that these are not numbers. Leave the 2nd row as “numeric” in this field.

Figure 2.6

Screenshot of SPSS screen after the creation of two variables.

Now click on “data view” from the tabs at the bottom of the screen and in rows 1-3 recreate the names and scores in Figure 2.1.

Figure 2.7

Screenshot of SPSS after input of data from Figure 2.1.

## Calculating the Center

One of the primary ways to describe some data is to calculate where the center of the data is. In statistics the two most popular measures of center are the mean and the median. The mean is the arithmetic average, while the median is the number in the center. Meaning half of the other data points lie below the median and half lie above the median[c]. In SPSS this can be calculated easily. In SPSS click on the ‘analyze’ menu on the top of the screen. Click on the ‘descriptive statistics’ option and then click on ‘frequencies’.

Figure 2.8

Graphic User Interface (GUI) for frequencies in SPSS part one.

Now move ‘Scores’ from the list of the variables in the left box to the ‘Variable(s)’ box.

Figure 2.9

Graphic User Interface (GUI) for frequencies in SPSS part two.

Now click on the ‘statistics’ box on the right.

Figure 2.10

Graphic User Interface (GUI) for frequencies in SPSS part three.

On the right-hand side there is submenu called ‘Central Tendency’. Check the ‘Mean’ and ‘Median’ boxes. Click the ‘Continue’ button at the bottom of the GUI.

Figure 2.11

Graphic User Interface (GUI) for frequencies in SPSS part four.

Now click ‘Ok’. SPSS will bring up a new ‘Output’ window and you should be see what is shown in figure 2.12.

Figure 2.12

Mean and Median from classroom of 3 students.

In Figure 2.12 you see SPSS has produced the arithmetic average called the ‘mean’ (86.3333) and the center data point called the median (85.0000). In this case the mean and median are almost the same. Let’s take another hypothetical classroom. We will replace the student ‘Mike’ with the student ‘George’. Go ahead and input this new dataset into SPSS and recalculate the mean and median[d]

Figure 2.13

Hypothetical test scores from a classroom of 3 students with Mike replace by George

NameScores
George10
Amy85
Jane98

Figure 2.14

Mean and Median from the classroom of 3 students where Mike is replaced by George.

### Video: Getting the Mean and Median in SPSS

Notice how the mean and median is affected by replacing Mike with George. The mean is now 64.3333 while the Median stays the same value of 85.00. Now the two measures of center disagree. The arithmetic average is affected by the fact that George’s score is so much lower than Amy’s or Jane’s. George’s score is what is called an ‘outlier’[f]. Notice that median is not affected by the fact that George’s score is an outlier. The median is called ‘robust’ to outliers, in other words it is not affected by outliers. In this case the median would be a truer measure of center than the mean as most of the students did score about an 85 with just one outlier who scored much lower. [g]If the mean and median are about the same, as they are in the class with Mike instead of George, the mean is used because of better mathematical properties of the mean as compared to the median.

Let’s go back to our classroom with Mike as seen in Figure 2.1. Go through the same process you did in calculating the mean and the median but in addition to clicking the ‘mean’ and ‘median’ boxes, in ‘Dispersion’ click the ‘Std. deviation’, ‘Minimum’, ‘Maximum’, and ‘Range’ buttons. Click ‘Continue’ and ‘Ok’ like normal.

Figure 2.15

Graphic User Interface (GUI) for the statistics submenu in SPSS.

Figure 2.16

Mean, Median, Standard Deviation, Range, Minimum, and Maximum, from the classroom of 3 students with Mike

The minimum and maximum are self-explanatory, while the Range is the Max-Min. Standard deviation of 11.06044 is the average distance from the mean. Please memorize this. The standard deviation is the average distance from the mean. [h]SPSS chooses its own decimal places sometimes, if your output looks different than mine in the decimal places don’t worry. It is calculated thusly:

$$Standard\ Deviation = \sqrt{\frac{\sum(x_i - \overline{x})^2}{n-1}}$$

Let's break down the numerator of this equation: $$\sum(x_i - \overline{x})^2$$, $$x_i$$ refers to each data point in the dataset, $$\overline{x}$$, refers to the mean, and $$\Sigma$$ is the capital greek letter sigma which means to sum the results. Thus, in our dataset we already have the mean calculated (86.3333) from figure 2.3. Let's do the operation for each of our data points:

$$(76-86.3333)^2=(-10.3333)^2=106.7777$$
$$(85-86.3333)^2=(-1.3333)^2=1.7778$$
$$(98-86.3333)^2=(11.66667)^2=136.1112$$

Now we do the $$\Sigma$$ operation and add these three numbers together to get our numerator (106.7777 + 1.7778 + 136.1112)=244.6667.

The denominator has $$n$$ which refers to the number of data points we have, in this case 3. Subtract 1 from 3 to get 2 and divide the numerator (244.6667) by the denominator (2) to get (244.6667/2)=122.3333. Referring back to the equation we have to take the square root of this number $$\sqrt{122.3333}$$ which results in 11.06044. This is just what SPSS tells us the standard deviation is. If you look at the dataset this makes sense as Mike’s score of 76 and Jane’s score of 98 are about 11 points away from the mean of 86.3333.

Let's go back to the classroom shown in figure 2.14 with George and recalculate our measures of spread.

Figure 2.17

Mean, Median, Standard Deviation, Range, Minimum, and Maximum, from the classroom of 3 students with George

Notice in this dataset the standard deviation is 47.50088. Just like the mean is affected by George, the outlier, the standard deviation is also not robust to outliers.

Instructions on how to make a youtube video[i]

https://edtechbooks.org/-sgPP

### Exercises

1. Without notes write down the standard deviation formula. Answer
2. By hand, calculate the standard deviation of this dataset (1,2,3). Show your work. Answer
3. In your own words, describe when you would use the median rather than the mean to describe the center of the data. Answer

[a]Needs a motivating example.

[b]I think this can be embedded in the Ed Tech book? Or linked? Something easy.

[c]This is an important definition and is buried in this paragraph. Maybe definitions could be in callout boxes? Have big, red arrows pointing to them? Something that indicates this is an important definition--remember it.

[d]This part seems sort of video-y, like it would be good as a video.

[e]More videos later on.

[f]I just realized that we didn't define outlier very explicitly.

[g]This is another important point, and it is buried in the paragraph. Important points need to be visually emphasized.

[h]Good definitions and important information. This should be more important visually so the reader knows they are important.

[i]Introduce AMOS