5  SPSS

This chapter will first go over some different data conventions, types, and formats before discussing how to read in and write data files. Syntax will be shown within the chapter as well as be available for download on Canvas. I strongly suggest creating your own syntax file and following along with the examples given, as that is excellent practice.

5.1 Argument for Syntax

While SPSS offers both pull-down menus and syntax as a way to interface with the program, we will only be using syntax. This is twofold: first, syntax often offers more options than are available through pull-down menus and second, because this allows you to save a record of your data manipulations, transformations, and procedures. This second reason is perhaps the more important one. You will have a record of what data file you read in, and in what format, what you did to it (did you add a column?), what analyses you performed and how, and if any data sets were saved out after your work. Since syntax files can be saved and shared, this will allow you or others to repeat your analyses very easily rather than having to know all the buttons you clicked, which options were selected, and in which order things were performed.

Think back to Excel - there is no record of what you did, only the end result. Additionally, in Excel you are working on the original data file. If you make a mistake and don’t immediately realize it - the entire file is corrupted. When using syntax, if you make a mistake, you can just re-read in the original data file and re-run the syntax before your mistake. This was my personal ‘lightbulb’ moment - I was far less afraid of making mistakes once I realized how easy it was to just bring in the original file again. I wasn’t going to ‘ruin’ it, because the syntax wasn’t changing the original, just what was in the SPSS memory.

5.1.1 How do I get to SPSS Syntax?

When you open SPSS, the syntax window does not automatically open. You will need to go to File -> New -> Syntax. Before you do anything else, save this file! Name it something that makes sense, and save it in a location that you will be able to find again. Also of note, your syntax file, data file(s), and output all save as separate files. Simply saving your syntax file will not save the other files automatically. Closing the last data window will exit SPSS completely - it will generally warn you as such.

Video: Getting to the SPSS syntax window

5.2 SPSS Syntax Rules

A general rule in SPSS: variable names sould be kept short, but not so short that you don’t know what they are. SPSS does not allow more than 40 characters (this includes underlines) in a variable name, nor does it allow variable names to begin with a number or symbol. It also does not allow spaces to appear in varable names: question_1 is okay, question 1 is not.

Syntax rules:

  • All file paths must be in quotes; you may use single (’’) or double (““) quotes, but be consistent.
  • All lines must end in a period (.) unless you have subcommands (more on that later)
  • Subcommands are indicated by an indent and /
  • The last subcommand should end in a period
  • If a variable is a string variable (i.e. contains words/letters), it must be designated by putting (A) after the variable location. NOTE: Both the () and the A are needed.
  • Comments are indicated by starting the line with a * and ending with a period.
  • SPSS will ‘help’ by autocompleting syntax as you type. This can be helpful, or annoying. If you don’t want this assistance, go to Tools -> deselect Auto Complete.

5.3 Comments

Comments in SPSS are preceded by a * and end in a .. In the syntax window, comments appear as grey text - that’s how you know if you did it right or not! In a syntax file, comments are written in regular English and are used to guide you and others through your syntax. Think of them as detailed notes to yourself so you can come back later and easily know what you did. They are also useful if you are working in a collaborative group - you may have written the syntax but you are likely not going to be the only user of the syntax. If you have multiple sentences, remember that SPSS will interpret the first period it encounters as the end of the comment.

The syntax files provided will likely be a bit more heavily commented than you might ‘normally’ do, since I am also leaving educational notes for you to follow.

As an example:

*This is an SPSS comment.

For longer comments:

*This is an SPSS comment. *This is more information being provided in a comment. *And one last bit of information.

Note the multiple periods and asterisks in the longer comment series.

Video: Syntax basics

5.4 Gathering Data

If you are going to be analyzing data, you should also be involved as much as you can when the data are gathered. This can help alleviate future headaches that could have been prevented with your input. Some things to keep in mind:

  • Use identification numbers, and keep them simple! There is nothing wrong with starting to number participants at 1 and continuing on for as long as you need.
  • Use open-ended format questions cautiously. While asking “How would you describe your gender?” in an open-ended format allows for participants to describe what best fits them, it also opens up the possibility of answers such as “F, female, Female, femael, girl, etc.”. All of those would be treated as a different gender by a program. Using a multiple choice format would eliminate that problem. You could still have an open-ended option for those who do not feel their gender is represented by your options, but it cuts down on the variability.
  • Categories for numeric variables may be tempting to use if that is what you’re currently interested in (e.g. What is your age? 6-10, 11-15, 16-20, etc.). However, if, in the future, you are interested in the same data but with a different age breakdown, you won’t have that data. It is better to gather numeric data in a continuous format and, if absolutely necessary, use syntax or a formula to make the groups later. You’ll always have the original data to go back to if you want to change your analysis!
  • Missing data should be coded in such a way as to not be mistaken for actual variable values! For example, some people may choose to code missing data as 99 or 999; however, if that is read in under an age column, odd things may happen with your analysis. Both 99 and 999 (one more plausibly than the other) are “valid” ages, at least according to a computer program that is only looking for numbers in that column. A more useful identifier is a . or even NA. A period is automatically recognized by many programs as missing data, and NA is recognized by R as missing data.

5.4.1 Data Types

Data are stored in variables, and variables contain a set of values. Gender, score, anxiety_level are all variables. The values contained in them may be female and male; 29, 20, 10, and 38; 9, 3, 5, 2. The values are what go in the ‘cells’ under each variable.

Variables are typically one of two formats: numeric or string. Numeric variables, as you may suspect, only contains numbers. If a numeric variable has a value that contains a string, it will not be interpreted as a valid value. String variables are those that contain numbers, characters, or a combination of the two. “Female” is a string variable, as is “6-10” and “2A”. The type of variable will influence what analyses are and are not able to be done with it.

5.5 Data Formats

Not all the data you will bring into SPSS will be from Excel - you can also bring in text files that have the data separated in a uniform fashion, or even data that is written continuously, one record right after another. Fixed width data have one record per observation (one line per unit of measurement), and the variables are located in specific columns. Columns can be easily seen in a text editor, or counted in any program by moving your cursor over one by one.

Fixed width data might look like this:

101 M  1011011001
102 M  0010101110
103 F  0000000000
104 F  1111111111
105 M  1111100000

Here, ID is in columns 1-3, gender is in column 5, and responses to a 10 question survey indicated in columns 8-17. Notice, also, that the variable names are not contained within the data itself; this is something you would need to know or have a key to.

5.5.1 Delimiters

Delimiters are things that separate your data. The types we will discuss are comma-delimited data, tab-delimited data, and space-delimited data.

Tab-delimited data (tabs show up as → in text editors)

101→M→1→0→1→1→0→1→1→0→0→1
102→M→0→0→1→0→1→0→1→1→1→0
103→F→0→0→0→0→0→0→0→0→0→0
104→F→1→1→1→1→1→1→1→1→1→1
105→M→1→1→1→1→1→0→0→0→0→0

Comma-delimited data

101,M,1,0,1,1,0,1,1,0,0,1
102,M,0,0,1,0,1,0,1,1,1,0
103,F,0,0,0,0,0,0,0,0,0,0
104,F,1,1,1,1,1,1,1,1,1,1
105,M,1,1,1,1,1,0,0,0,0,0

Space-delimited data

101 M 1 0 1 1 0 1 1 0 0 1
102 M 0 0 1 0 1 0 1 1 1 0
103 F 0 0 0 0 0 0 0 0 0 0
104 F 1 1 1 1 1 1 1 1 1 1
105 M 1 1 1 1 1 0 0 0 0 0 

5.5.2 Free Format

Free format data have the variables delimited by a space or comma, but rather than one record per observation (one row per unit of measurement), one observation’s data immediately follows the next. This type can be visually challenging for a human to parse.

Free format data: 101,M,1,0,1,1,0,1,1,0,0,1,102,M,0,0,1,0,1,0,1,1,1,0,103,F,0,0,0,0,0,0,0,0,0,0,104,F,1,1,1,1,1,1,1,1,1,1,105,M,1,1,1,1,1,0,0,0,0,0

5.6 Read in External Files

For simplicity, we will use the GET DATA command to read in most external data files. There are other options for certain file types, but they will not be addressed here. See the table below for details about the command and (some) of the associated sub commands. TXT files with decimals, SPSS, and SAS data have their own commands.

Command Subcommand Function
GET DATA Reads data from Excel files, databases, and text data files
/TYPE = xlsx Must be the first subcommand specified; options include XLS, XLSX, and TXT
/FILE = file/path/here Required for xls, xlsx, and txt files; must immediately follow /TYPE subcommand; specifies file to read in
/SHEET = name ‘name_of_sheet’ Reads in the specified sheet of an Excel workbook
/READNAMES = off Indicates that the first row of the Excel sheet is data, not variable names.  “On” is the default.
/FIRSTCASE = 2 When this is used, it specifies where the data starts. Using a ‘2’ as in the example indicates that data starts on the second row.
/ARRANGEMENT = Specifies data format. Takes either DELIMITED or FIXED after the =
/DELIMITERS = Specifies delimiter used in a txt file.  Options include tab (“\t”), comma (“,”) and space (” “).  Delimiter must be indicated within quotes.
/VARIABLES For delimited files, specifies variable names and formats.  For fixed-format files, specifies variable names, start and end column locations, and variable formats.
/SKIP = Specifies the number of lines to skip in a .txt file. This may be because there are instructions on the first (or more!) line, or for other reasons.

5.6.1 Finding the Path of a File

When reading in data, you will need the file path of your file. On a PC, this will often take the format of ‘C:/Documents/some_folder/some_other_folder.extension’. The easiest way to get this is to copy the file path, then paste it into your SPSS syntax. Mac users and PC users get the file path slightly differently. For a PC, hold down ‘Shift’ on your keyboard then right click on the file. You will get a long menu; select ‘Copy as Path’. Then paste into the SPSS syntax window. For a Mac, go to the Finder window and find your file. Left click once on the file to select it, and hit ‘Command’ + ‘Option’ + ‘C’ on your keyboard. You are now able to paste the file path in SPSS.

Video: Copying file path

5.6.2 Fixed Width

With fixed width data, you first need to determine which columns each of your variables occupies. This is best accomplished in a text editor (i.e. Notepad++ or Notepad for PC users). This format of data will use the DATA LIST FILE = command:

DATA LIST FILE = ‘path\to\file.txt’
 SKIP = 1
 /var1 1-2 var2 3-6 var 3 7-10
EXECUTE.

Notice how after each variable name the number of columns occupied by that variable are indicated (e.g. var1 is the variable name, 1-2 are the column numbers occupied by this variable).

Another useful subcommand for DATA LIST is SKIP =. This indicates how many lines to skip in the txt file. There may be instructions, or just garbage at the top of the file that you don’t want or need in your data. NOTE: SKIP is not preceded by a /, but it is indented!

DATA LIST FILE = ‘path\to\file.txt’
 /var1 1-2 var2 3-6 var 3 7-10
EXECUTE.

5.6.3 Delimited Data

Comma-delimited:

GET DATA
 /TYPE = TXT
 /FILE = ‘path\to\file.dat’
 /DELIMITERS = “,”
 /VARIABLES = id F3.0 gender A1 score1 F4.1 score2 F4.1 score3 F4.1.
EXECUTE.

Tab-delimited:

GET DATA
 /TYPE = TXT
 /FILE = ‘path\to\file.dat’
 /DELIMITERS = “\t”
 /VARIABLES = id F3.0 gender A1 score1 F4.1 score2 F4.1 score3 F4.1.
EXECUTE.

Space-delimited:

GET DATA
 /TYPE = TXT
 /FILE = ‘path\to\file.dat’
 /DELIMITERS = ” “
 /VARIABLES = id F3.0 gender A1 score1 F4.1 score2 F4.1 score3 F4.1.
EXECUTE.

5.6.4 Excel files

Excel file with .xls extension where the first row does not contain variable names:

GET DATA
 /TYPE = XLS
 /FILE = ‘path\to\file.xls’
 /READNAMES = off.
EXECUTE.

Excel file with .xlsx extension, asking for Sheet2 specifically:

GET DATA
 /TYPE = XLSX
 /FILE = ‘path\to\file.xls’
 /SHEET = name ‘Sheet2’.
EXECUTE.

5.6.5 SAS files

SAS files are read in using the following: GET SAS DATA = 'path\to\file.sas7bdat'.

5.6.6 SPSS data

You can read in SPSS data files using the following: GET FILE = 'path\to\file.sav'.

Video: Reading in external files

5.6.7 Reading in Decimals

If your data has decimals, and you are reading in from a .txt file, there are some extra steps to take to ensure your data are read in correctly. If you do not specify that there are decimals, SPSS will read in the whole number to the left of the decimal, and ignore the rest. It will also throw some error messages at you.

Let’s say we have the following data in the file results.txt, showing the ID (column 1), 1st draw results (columns 3-6), 2nd draw results (columns 8-11), the average of the two draws (columns 13-17), and high/low (column 20). We can see these column numbers in a text editor (e.g. Notepad++)

1 15.5 18.3 16.90  H
2 11.4 19.7 15.55  H
3 17.0 13.7 15.35  H
4 18.5 10.8 14.65  L
5 11.0 16.0 13.50  L

We can read this in using

DATA LIST FILE = “C:\Documents\results.txt”
 /id 1 draw1 3-6 draw2 8-11 avg 13-17 level 20 (A).
 EXECUTE.

Notice that since ‘level’ is a character variable, we specified that with (A) when reading in the data. Additionally, we are only getting the numbers to the left of the decimal for our numeric data.

We can specify the numeric format using (FX.Y) after each variable, where F indicates it is a numeric variable, X is the number of columns occupied by the variable, and Y is the number of columns AFTER the decimal. As an example, the first value in draw1, 15.5, takes up 4 columns (1|5|.|5). Looking at all values in that column, we see that they all take up 4 columns. We can also see that there is one column after the decimal. To properly specify this to SPSS, we would add (F4.1) after the draw1 in a FORMAT command. We also have a character variable (level). This is specified using A# where A is indicating a character variable and # indicates the number of columns occupied by that variable. Since it is a character variable, there is no need to specify decimals.

The FORMATS command to specify decimal format would look like so:

FORMATS id (F1.0) draw1 (F4.1) draw2(F4.1) avg (F5.2) time (A1).

This process is more straightforward than using a GET DATA command.

Video: Specifying decimals

5.7 Saving Data Files

You are able to save your data in SPSS using syntax. To save as a SPSS data file, you will use the SAVE OUTFILE command. The (abbreviated) command structure is shown in the table below. There are a few options I am not including; the Command Syntax File lists all options. Remember to include the ‘.’ after your last command. SAVE OUTFILE does not require an EXE command at the end.

You are also able to save you data files as other data types: Excel files (.xlsx), tab delimited files (.tab), comma delimited (.csv), and as a SAS file. The command to save as other file types is SAVE TRANSLATE. NOTE: SAVE TRANSLATE must end with EXE. to run!

Command Subcommand Function
SAVE OUTFILE = ‘path\\to\\file’ Saves the SPSS data as a SPSS data file to the indicated location
/DROP = variable_name(s) Removes the indicated variable(s) from the dataset prior to saving
/KEEP = variable_name(s) Only keeps the indicated variables in the dataset when saving
/RENAME = (old_var_name1 = new_var_name1) (old_var_name2 = new_var_name2) Renames the indicated variables prior to saving
Command Subcommand Function
SAVE TRANSLATE Saves data file in a non-SPSS format
/OUTFILE = ‘file\path\here’ Gives location and name of the file to be saved
/TYPE = xlsx Indicates you are saving an Excel file
/TYPE = csv Indicates you are saving a comma-delimited file
/TYPE = tab Indicates you are saving a tab-delimited file
/TYPE = sas Indicates you are saving a SAS file
/VERSION = (number) Tells what type of file you’re saving out; VERSION = 9 is for SAS, VERSION = 12 is for Excel 2007 and newer.
/KEEP = variable_name(s) Only keeps the indicated variables in the dataset when saving
/DROP = variable_name(s) Removes the indicated variable(s) from the dataset prior to saving
/RENAME = (old_var_name1 = new_var_name1) (old_var_name2 = new_var_name2) Renames the indicated variables prior to saving
/FIELDNAMES Puts variable names in first row of a comma- or tab-delimited file
/REPLACE Gives permission to overwrite an existing file of the same name
/CELLS = labels Exports value labels (e.g. Male/Female) rather than numeric values (e.g. 0/1)

Below are some examples of saving data files in various formats with different options selected.

5.7.1 SPSS

*Save a SPSS data file.

SAVE OUTFILE = ‘path\to\file.sav’
 /KEEP = id, social, motivation.

5.7.2 Excel

*Save an Excel data file.
*Export value labels, too.

SAVE TRANSLATE
 /OUTFILE = ‘path\to\file.xlsx’
 /TYPE = XLS
 /CELLS = labels.
EXECUTE.

5.7.3 Comma-delimited

*Save a comma-delimited file.
*Also export variable names.
*Chose to do /replace because I want to overwrite my old file.

SAVE TRANSLATE
 /OUTFILE = ‘path\to\file.csv’
 /TYPE = CSV
 /FIELDNAMES
 /REPLACE.
EXECUTE.

5.7.4 Tab-delimited

*Save a tab-delimited file.
*Exporting variable names and overwriting any old files.
*Dropping the name column.

SAVE TRANSLATE
 /OUTFILE = ‘path\to\file.dat’
 /TYPE = TAB
 /DROP = name
 /FIELDNAMES
 /REPLACE.
EXECUTE.

5.7.5 SAS

SAS has two specific subcommands: VERSION and PLATFORM. VERSION indicates which version of SAS it should be saved for; 9 indicates version 9 (NOTE: 8 does NOT indicate version 8; see the Command Syntax Reference for more). PLATFORM is indicating if it is intended for Windows or Unix machines.

*Save a SAS file.
*First time saving this file, so no other file by this name exists.  
*No need for replace.

SAVE TRANSLATE
 /OUTFILE = ‘path\to\file.sas7bdat’
 /TYPE = SAS.
 /VERSION = 9.
 /PLATFORM = Windows.
EXECUTE.

Video: Saving Files

5.8 Saving Syntax Files

Sadly, we cannot use syntax to save syntax files. You can, however, save your syntax file much like you would save any document: File -> Save or by clicking the Save icon, selecting the desired location, and typing the file name.