Appendix A: Introduction to SPSS
The main software you will be using for this course is SPSS Statistics (“SPSS”).
SPSS is available to BYU students using a cloud-based workspace called Cloud Apps.[a]
Opening the Software
To open SPSS, visit cloudapps.byu.edu. Then click ‘Log Into Cloud Apps.’
On the next page, enter your BYU email address (firstname.lastname@example.org) and password into the login box. This should be the same information you use to log into Learning Suite and other BYU programs. Once you have entered your information, click ‘Log On.’
Once you log in, you’ll see a list of recently used apps. To see the full list of apps, select Apps → All Apps.
This page shows all of the apps you can access through Cloud Apps. Scroll down to find the SPSS logo.
Click the icon to launch the software. It should open in a new window in your browser. It may take several minutes to load.
Getting Familiar with SPSS
This section is designed to familiarize you with the SPSS graphical user interface (GUI). This welcome dialog box is the first screen you see when you open SPSS. It includes the option to:
- Open a new dataset
- Restore a recent file
- View sample files
- Or view SPSS tutorials, among other things
To get to the main screen, close this window. Then, you will see the following screen. This is a blank data file.
Data View and Variable View
There are two ways of viewing data files in SPSS: Data View and Variable View.
Data View: This is where your spreadsheet’s data is kept. It is laid out in a traditional column and row format, much like Microsoft Excel.
Variable View: This houses information about the variables in your dataset, including their name, description (AKA label), missing values, and more.
So, how do you know which view you are in? At the bottom of the data file, you’ll see the following.
Whichever tab is underlined in blue represents the view you are currently in. Select the other tab to change the view.
Creating New Files
To create a data file in SPSS, go to File → New → Data. This will create a new, blank data file where you can enter data.
As you can see, SPSS also gives the option to create a new syntax file. What is syntax?
SPSS Syntax: Code that tells SPSS what to do and how to do it. Much like in other statistical software, there is code behind each action you perform in SPSS.
You do not have to know how to code to analyze data in SPSS. However, on most actions in SPSS, you will see the option to ‘Paste.’ By clicking paste, SPSS will add the syntax for the action you want to perform to a syntax file where you have a record of your changes. It is best practice to ‘Paste’ your syntax as you go. Save your syntax file in case you need to reference it later or re-run something.
When should you create a new syntax file?
When you want to designate a location for the syntax you ‘Paste.’ This is helpful if you are working on a project and want to track all of the changes in one place.
When you run analyses or create tables/graphs/charts in SPSS, what you create will be added to an output file. If you would like to designate a location for your output, you can create a new file specifically for your project.
Opening Pre-existing Files
To open a pre-existing data file in SPSS, go to File → Open → Data and locate the file on your computer or in Box.
When you launch SPSS, you will be prompted to authenticate Box using the same login you use to get into Cloud Apps. After authenticating Box, you will have access to files saved in Box and won’t need to re-authenticate on subsequent logins.
Creating Dummy Variables
Many data sets include nominal variables: variables with multiple categories, but no intrinsic order to those categories (e.g., gender, race, favorite fruit).
To use a nominal variable in analysis (such as linear regression), you may need to dummy code.
Dummy Coding: Turning the response options from a nominal variable into dichotomous variables (which have only two response options).
Example: Let’s say our data set includes the variable ‘FavoriteFruit,’ which has four response options:
1 = Apples
2 = Bananas
3 = Oranges
4 = Other
By dummy coding ‘FavoriteFruit’, we will create four new variables:
Each new variable will have two response options – represented by 1s and 0s. For the new variable ‘FavoriteFruit_Apples,’ a value of 1 will represent people who said their favorite fruit is apples. A value of 0 will represent those who listed a fruit other than apples as their favorite. The value coded as 1 becomes the reference category in a regression equation.
How to Dummy Code
To dummy code a variable, go to Transform → Create Dummy Variables.
Then, select the variable you wish to dummy code and click the arrow to move it to the ‘Create Dummy Variables for:’ box.
Next, add a name in the ‘Root Names’ box. SPSS will use this as the first part of the name in the dummy coded variables. Under ‘Measurement Level Usage,’ select ‘Create dummies for all variables.’
After we click ‘OK’ or ‘Paste’ and run our syntax, SPSS should create four new variables.
By double clicking on the variable name in the ‘Name’ column, we can rename the variables so it’s easier to identify which is which.
You can double check that the variables have been dummy coded by looking at the Data View tab. The new variables should only include 1s and 0s.
Recoding ID Variables
Many data sets include unique identifiers for each participant. These are often a combination of letters, numbers, and characters, also known as a string variable.
Why would I need to recode a string variable? If you’ll only be analyzing your data in SPSS, you probably do not need to work about recoding your string variables. However, if you plan on using other statistical software (e.g., Mplus) to analyze your data, you must recode. Mplus will not run files with string variables.
By recoding your string variables, you can preserve a unique identifier without limiting yourself to one software.
How to Recode String Variables
Go to Transform → Automatic Recode
In the pop-up box, move your ID variable to the ‘Variable->New Name’ box. Then, type a name for your new variable into the ‘New Name’ box. Press ‘Add New Name’ to add it.
Click ‘OK’ or ‘Paste’ and run your syntax. Your new variable should only include numbers. You may wish to save a copy of your file that includes the old and new ID variables so you know which is which. Delete your old ID string variable from whatever file you plan to export for use in software like Mplus.
It is often helpful to have details about the variables in your dataset. This may include the minimum, maximum, standard deviation, mean, median, mode, and more. To generate these descriptive statistics, go to Analyze → Descriptive Statistics → Frequencies.
Then, select the variables you wish to see the descriptive statistics for and click the arrow to move them to the ‘Variable(s)’ box. Select ‘Statistics…’ to choose which descriptive statistics you wish to generate.
Here we have selected mean, median, standard deviation, minimum, and maximum, but you can select whatever you wish. Click ‘Continue.’
You may wish to deselect ‘Display frequency tables.’ Click ‘OK’ or ‘Paste’ and run your syntax. SPSS will generate tables with the data you requested.
Recode into same
How to Generate a Scatterplot
Histograms are often used to visualize continuous data. Histograms display a distribution of scores, with each bar representing a different value.
How to Generate a Histogram
Data sets usually include missing data. For example, when a survey participant skips several questions on a survey, the responses to those skipped questions are ‘missing.’ You may notice missing data in your dataset because the cells will have a period in them.
Why do I need to tell SPSS I have missing data? Although you may know what data is missing, SPSS does not know this intuitively. It only knows that some cells are filled with periods (AKA system missing). As a result, you need to tell SPSS what cells the software should read as missing. It is helpful to fill missing cells with a value like -999 that is easily identifiable as missing.
What are the consequences of not addressing missing data? When handled incorrectly, missing data can skew analyses and produce confusing or incorrect results.
There are several ways to tell SPSS that you have missing data.
How to Specify Missing Data
There are two steps to identifying missing data in SPSS. The first involves filling your blank and period-filled boxes with a value that
Go to Transform
How Does SPSS Handle Missing Data?
Some statistical software (including SPSS) uses listwise deletion, which excludes a participant from the analysis if they are missing data on one or more of the variables you are trying to analyze. Listwise deletion is problematic because it reduces the sample size and statistical power of your survey. In some analyses in SPSS, you can choose to
How can I learn more about how to handle missing data? You may want to take a class specifically about how to handle missing data.
Missing data - go take a class, labeling
[a]Should we include this portion about Cloud Apps?