7  More Advanced SPSS

7.1 Handling Missing Data

In SPSS, there are two types of missing data: system missing and user-defined missing. System missing data is data that were read in or initially entered in as missing. These may be blanks or “.” in numeric data. Importantly, there is no such thing as a system missing string value in SPSS, because a blank is treated as a valid value. User-defined missing values on the other hand are values set by the user as missing. In other words, if you have a string variable and want a blank to be treated as a missing value, you (the user) would define the blank as a missing value. You can also define missing values for numeric variables (maybe a 0 on a 1-7 Likert scale, for example).

To indicate user-defined missing values, use the MISSING VALUES command: MISSING VALUES q1 to q5 (0) Gender (" ").

This will assign values of 0 for questions 1 through 5 (q1 to q5) as user-defined missing, as well as assigning blanks for the string variable gender as missing.

If we run a frequency analysis on a variable that contains both system missing and user-defined missing values, we will see that both are listed at the bottom of the table, and both are counted in the Percent column. However, neither of them are counted in teh Valid Percent column. This is letting us know that SPSS is treating them as ‘part of the data’ (i.e. 3% of your data is system missing, 2% is user-defined missing, and the other 95% is distributed between the valid response options), but not treating them as valid data.

7.1.1 Identifying Missing Data

Determining how much of your data is missing can be accomplished on a variable by variable basis with a FREQUENCIES or DESCRIPTIVES command. Each of these will quantify how many cases in your dataset have missing data for each variable. You may want to know how much missing data each case has, though. As an example, maybe your lab allows up to 25% of a survey to be missing before they deem it invalid, so it’s important that you be able to determine how much is missing. Or, even a certain part of the survey: maybe it doesn’t affect survey integrity if the demographic questions are left blank (i.e considered missing data), but it does matter if other questions are missing. We can create a new variable, perhaps creatively named missing, to compute how many variables hve missing data.

Going back to our example above, if we wanted to know how many question responses were missing for q1 to q5, we would do the following:

COMPUTE missing = nmiss(q1 to q5).
EXECUTE.

The COMPUTE command is creating a new variable named missing (or whatever you chose to name it). Then, the function nmiss will count the number of missing values of whatever variables are in the parenthesis. Note: nmiss only works if all the entered variables are numeric.

We can still determine the number of missing variables for string variables, we will just use the COUNT command instead:

COUNT missing2 = q6 to q10 (missing).
EXECUTE.

Here, the command COUNT is creating a new variable missing2. In it, is the count of q6 to q10 that have the value “missing”. We could also use this to count any other value we wished - just replace “missing” with what you want to count. Additionally, COUNT can be used with either numeric or string variables, but you can’t mix types within one command.

Regardless of how you calculated the number of missing values, a good idea after creating a new variable is to look at your data to see if what you thought would happen actually did happen. We can print out the first few rows of our data using the LIST command, and by using the /CASES = subcommand, we can restrict it to just the first few rows:

LIST q1 to q10 missing missing2
 /CASES = from 1 to 10.
EXECUTE.

Here, we are asking for a list of variables q1 to q10 as well as the two new variables we created, missing and missing2. We also don’t need a print out of the entire dataset to check our work, and have decided that just the first 10 rows will let us spot-check just fine.

Once we have created the new missing variable(s), we can run FREQUENCIES VARIABLES to see how cases have missing values. We may also be interested in the pattern of missing values. Are all the missing values on q2? Or are they spread across all the questions? To get this information we can use the MVA VARIABLES command:

MVA VARIABLES = q1 to q10
 /TPATTERN NOSORT PERCENT = 0.

This will create a chart showing what questions are missing data, and how many.

7.1.2 Dealing with Missing Data

Once you have determined how much missing data you have (and where that missing data is), the next question is then what do you do about it? One option is deleting cases with any missing data. This will result in a dataset only including cases with complete data, but it also may result in a much smaller dataset. This is not recommended unless there’s a good reason to do so. Another option is to retain cases with missing data, but exclude them on a case-by-case basis, depending on what the analysis is. For example, if case 23 has missing data for q4, they will not be included in analyses for that question, but would be included in other analyses.

Another way of stating the above is to perform listwise deletion of data or pairwise deletion of data. Listwise deletion is what was described first above: cases that have missing values for any variables are excluded. Pairwise deletion, on the other hand, is the second option described above: it uses all cases that have a valid responses for the particular statistic being calculated. One potential drawback to pairwise deletion is that descriptive statistics can be based on different groups.

NOTE: Pairwise and listwise deletion only refers to handling missing data when the analysis is using more than one variable. If you are only performing an analysis on a single variable, both listwise and pairwise deletion will result in the same outcome.

Let’s use the sample data below as an example:

Cases Variable X Variable Y
1 15 16
2 . 19
3 17 .
4 10 13
5 20 16

If we used listwise deletion, and asked for the means for each variable, they would be calculated based on the following: Variable X: 1, 4, 5 Variable Y: 1, 4, 5

On the other hand, if we used pairwise deletion, the following cases would be used for analysis: Variable X: 1, 3, 4, 5 Variable Y: 1, 2, 4, 5

Typically, the conversation between listwise and pairwise deletion only comes up when doing bivariate mor multivariate statistics (e.g. correlations, multiple regression, MANOVA, etc.). However, if you are using a single command to request univariate statistics for multiple variables, it is also to keep in mind. If we are using a single MEANS TABLES command to get descriptive statistics on both Variable X and Variable Y, there would be a difference in what cases were used. All that said, neither pairwise nor listwise deletion is ideal in most situations, but they are still widely used. The better option would be to use multiple imputation and estimation using maximum likelihood, both methods of which are beyond the scope of this course.

7.2 Data Transformations

You will often want to create new variables in order to perform analyses, or alter coding schemes. A good example of this is above, when we created the missing variables. You may also wish to recode variables (e.g. on a Likert scale, some items may need reverse coding). Moving beyond the simple, you may wish to perfom these tasks on only some of your data rather than the whole dataset.

These types of action - creating variables, recoding variables, and conditional execution of tasks - are called data transformations. Perhaps of comfort, these actions do NOT modify your original data file (the one you read in), only your active dataset. This was a lesson that took me a bit to understand, having only worked with Excel prior and having been nervous to make a mistake and ruin all my data. However, it is important to remember that these transformations are ONLY in your active dataset and will disappear when you close SPSS if you do not save them. If you wish for these variables to persist, make sure to SAVE OUTFILE! To preserve your original dataset, it is good practice to save your transformed version under a different name.

7.2.1 COMPUTE

As we saw earlier, we can use the COMPUTE command to create new variables. When we are using this command, we also need to think about missing data - how do we want it handled? As an example, consider the following dataset:

q1 q2 q3
4.00 5.00 3.00
2.00 3.00 .
. . .
. 1.00 .

We have fabricated responses to 3 questions. Notice how only one row has complete data; other rows have at least one missing response. Keep in mind that while this is a very short dataset for illustration purposes, it would not be so easy to determine missing data with a larger dataset.

If we first use COMPUTE to calculate how many missing values each individual has, we would get the variable added to the data table:

COMPUTE missing = NMISS(q1, q2, q3).
EXECUTE.

q1 q2 q3 missing
4.00 5.00 3.00 .00
2.00 3.00 . 1.00
. . . 3.00
. 1.00 . 2.00

Now, let’s say we wanted to calculate a total score using the SUM function. Here is where we need to make some decisions. We can calculate this many ways; three examples are shown below:

  1. COMPUTE sum1 = SUM(q1, q2, q3).
    EXECUTE.

  2. COMPUTE sum2 = q1 + q2 + q3.
    EXECUTE.

  3. COMPUTE sum3 = SUM.2(q1, q2, q3).
    EXECUTE.

Each of these uses the COMPUTE command, but results in different output. Let’s see what the output looks like in our new table before discussing further:

q1 q2 q3 missing sum1 sum2 sum3
4.00 5.00 3.00 .00 12.00 12.00 12.00
2.00 3.00 . 1.00 5.00 . 5.00
. . . 3.00 . . .
. 1.00 . 2.00 1.00 . .

Looking at the table, we see that while the three sum columns were all created, there was varying levels of completeness. Each of the different ways of asking for a sum resulted in a different pattern of missing data.

For our first command, using the SUM() function, SPSS will compute the requested action regardless of if there is missing data on q1 - q3 or not. Notice for this one that the only case with a missing value is case number 3 - and that’s because it has no data! All the values are missing. With the second command, manually typing out which columns to add and separating them with the ‘+’ operator, this is saying Only give me a sum if each of the variables I have typed has a valid response (i.e. no missing data). If there is any missing data, a case will instead get a system missing value. We can see this comparing case 1 with cases 2-4; only case 4 has a valid response for q1 - q3, and this is the only one with a value in sum2. Lastly, SUM.2() is saying Only give me a sum if there is at least two valid responses. We can see that since both case 1 and 2 have at least 2 valid responses, they have sums. Cases 3 and four do not. When you might and might not use these will depend on your end goal, and what you have decided to do with any missing data. As an aside: you may substitute any number after ‘SUM’. If we had more questions and wanted at least 5 to be answered, we can do SUM.5().

7.2.1.1 Other Computations

Along with using ‘+’ to create a sum, SPSS also recognizes the following symbols: * Multiplication ** Exponentiation - Subtraction / Division

7.2.2 Functions

In SPSS, a command is a specific instruction to the software to perform an operation on the data. A function, on the other hand, is a smaller piece of code that is used to perform a specific calculation or manipulation. SUM(), as shown above, is a function - it is calculating the sum of the stated variables. There are other useful functions in SPSS, a portion of which are shown in the table below. Another good resource is from the UCLA Statistical Methods and Data Analytics website: https://stats.oarc.ucla.edu/spss/modules/using-spss-functions-for-making-and-recoding-variables/

Function Action
SUM() Calculates the sum of the enclosed variables
MEAN() Calculates the mean of the enclosed variables
NMISS() Calculates the number of missing values of the enclosed variables
SQRT() Calculates the square root of a number
EXP() Calculates the exponent of a number
LN() Calculates the natural log of a number
LOG10() Calculates the log base 10 of a number
TRUNC() Takes a decimal number and converts it to a whole number by removing all the decimal places
RND() Rounds the number(s) to the nearest whole number using conventional rounding rules. If you want it rounded differently, include that as a second argument.

7.2.2.1 TRUNC vs. RND

As a further illustration of how TRUNC and RND behave differently, consider the following data:

ID X Y
1 7.98 6.25
2 3.68 1.22
3 7.65 6.15
4 3.77 6.41
5 9.01 8.66

If we wanted to use TRUNC on X, we could accomplish that with a COMPUTE command:

COMPUTE X_trunc = TRUNC(X).
EXECUTE.

We would then have the following table:

ID X Y X-trunc
1 7.98 6.25 7
2 3.68 1.22 3
3 7.65 6.15 7
4 3.77 6.41 3
5 9.01 8.66 9

Notice how all TRUNC did was remove the decimal places? There was no rounding involved. If instead we wanted to round to the nearest whole number, we would use RND:

COMPUTE X_rnd = RND(X).
EXECUTE.

And end up with this:

ID X Y X_trunc X_rnd
1 7.98 6.25 7 8
2 3.68 1.22 3 3
3 7.65 6.15 7 7
4 3.77 6.41 3 4
5 9.01 8.66 9 9

Notice how when we use RND(), conventional rounding rules are followed. We can also round to certain decimal places by adding an argument. Let’s say we wanted to round Y to the nearest tenth. To accomplish this, we would edit our function like so:

COMPUTE Y_rnd = RND(Y, .1).
EXECUTE.

An end up with the following table:

ID X Y X_trunc X_rnd Y_rnd
1 7.98 6.25 7 8 6.30
2 3.68 1.22 3 3 1.20
3 7.65 6.15 7 7 6.20
4 3.77 6.41 3 4 6.40
5 9.01 8.66 9 9 8.70

Notice how the Y values were indeed rounded to the nearest tenth, but they retained a ‘0’ in the hundreths place (i.e. 6.25 was correctly rounded to 6.3, but it is reported as 6.30).

7.2.3 String Variables

There are also functions for string variables. It wouldn’t make much sense to take the mean of a string variable, but you may want to switch it from upper to lower case (or make sure all entries are the same case). Or, you may want to only keep the first few letters of a word. Using an education example, if you had a variable named course and had entries such as BIOL123, CHEM230, ENGR100, etc. and you wished to separate out the department (i.e. BIOL, CHEM, ENGR) from the course number we could accomplish that. Similar to what we did in Excel, we may also wish to separate first from last names. The table below contains some useful string functions.

NOTE: Prior to creating a new string variable, you will need to use a STRING command to tell SPSS the name of said variable you intend to create as well as how long it is (see below for examples).

Function Action
UPCASE() Turns all letters uppercase
LOWER() Turns all letters lowercase
CHAR.SUBSTR() Takes 3 arguments: (variable to act on, character to start at, characters to keep).  This will start at the indicated character of the entry and keep the stated number of characters.
LEN() Returns the length of the string variable.
RTRIM() Deletes excess blanks to the right of the variable entry.
CHAR.INDEX() Calculates the location of a particular character in a variable
CONCAT() Merges variables together

7.2.3.1 Separating and Concatenating String Variables

Let’s start with the following table:

Name
Frodo Baggins
Peregrin Took
Legolas Thranduilion
Samwise Gamgee
Fredegar Bolger

We can see it has but one column, Name. If we wanted to have a First Name and a Last Name column, we will need to take a few steps to get there. First, we will need to determine where we want to split the two parts of the string. In this instance, the space (” “) between the first and last name makes most sense. Other times you may want to split at a comma (”,“), especially if names are in the format of LastName, FirstName. Since the names are not of uniform length, we will need to create a variable containing the split point location to use later.

COMPUTE space = CHAR.INDEX(Name, ” “).
EXECUTE.

Now that we know where that space is, we can use it to get the first_name and last_name variables. Starting with first_name, we will first need to indicate we are making a string variable, and tell SPSS how long we anticipate it to be. The CHAR.SUBSTR function we will use can be seen below. The first argument (Name) is indicating which variable we should be starting with. The second argument (1) is saying what character position within that variable we should start at. Finally, the second argument (space-1) is saying when to stop. Here, notice how we are using the calculated variable space in order to allow flexibility in what we are capturing; this allows the first names to be of different length.

STRING first_name (A8).
COMPUTE first_name = CHAR.SUBSTR(Name, 1, space-1).
EXECUTE.

We now have the following table:

Name space first_name
Frodo Baggins 6 Frodo
Peregrin Took 9 Peregrin
Legolas Thranduilion 8 Legolas
Samwise Gamgee 8 Samwise
Fredegar Bolger 9 Fredegar

Now that we have the first name sorted, let’s get the last name. We will use a similar command as with the first name, but this time CHAR.SUBSTR will not have a third argument (see below). SPSS will interpret that as “go until there are no more characters to grab”. Notice how we are also using “space+1” as our second argument. This is saying go to the location calculated in the space column, add one, and then start.

STRING last_name (A12).
COMPUTE last_name = CHAR.SUBSTR(Name, space+1).
EXECUTE.

We now have a table that contains a first name variable as well as a last name variable:

Name space first_name last_name
Frodo Baggins 6 Frodo Baggins
Peregrin Took 9 Peregrin Took
Legolas Thranduilion 8 Legolas Thranduilion
Samwise Gamgee 8 Samwise Gamgee
Fredegar Bolger 9 Fredegar Bolger

If we had started with only first and last names and wanted instead to combine them into one variable full_name, we can use the CONCAT() function. With this, we specify what we want joined, in what order, and if there’s anything else we’d like added to it. Using the names as an example, we can combine first and last name into one variable, but keep first and last name separated by a space. Don’t forget: we still need to specify that we are creating a new string variable, and how long it will be. Notice how we asked for the space between the names within the CONCAT() function. Looking at what we provided, we are saying “Take the contents of first_name, then put a space (” “), then put the contents of last_name.” We can put together any number of things. If we wanted the names separated by “and” we could do that as well, but we would want to recalculate how long our new string variable would be.

STRING full_name (A20).
COMPUTE full_name = CONCAT(first_name, ” “, last_name).
EXECUTE.

Our table will now look like this:

Name space first_name last_name full_name
Frodo Baggins 6 Frodo Baggins Frodo Baggins
Peregrin Took 9 Peregrin Took Peregrin Took
Legolas Thranduilion 8 Legolas Thranduilion Legolas Thranduilion
Samwise Gamgee 8 Samwise Gamgee Samwise Gamgee
Fredegar Bolger 9 Fredegar Bolger Fredegar Bolger

Video: Sting Syntax

7.2.4 ALTER TYPE

Sometimes you will want to turn a numeric variable into a string, or a string into a numeric variable. Perhaps you had a string variable address that contained house number and street name (i.e. 123 Main St.) that you split up into house_num and street. However, house_num will still be treated as a string variable. If you wanted it to be a numeric variable, you would use ALTER TYPE():

ALTER TYPE house_num (F5.0).

Note: ALTER TYPE does not require an EXECUTE. after it. We do have to say how long the numeric values are with (F5.0). If we were going the other way (numeric to string), we would need to use (A#).

7.2.5 RECODE and RECODE INTO

Both RECODE and RECODE INTO will recode your data. RECODE will overwrite the original variable - not a great idea!! RECODE INTO is a much safer option, as it creates a new variable from the original, using the coding specified. One reason you may want to use RECODE INTO is if you have a Likert scale questionnaire, and some items need to be reverse-scored. For example, let’s say you are trying to measure depression, and for most items, higher scores (i.e. ‘Strongly Agree’) indicate higher levels of depression. These items may be something like “I find it hard to shower regularly.” or “Some days I stay in bed all day.”. However, some items may be phrased like “I find it easy to maintain basic hygiene.” A higher score on this item would not indicate higher levels of depression, but rather lower levels. You would want to reverse-score this item to reflect this.

Sometimes when you get data from collaborators or outside sources, you’re not sure if a variable has been reverse-scored yet. One way to check this is to correlate the variable that is oppositely worded with the other variables. Positive correlations would indicate the variable is already reverse scored while negative correlations would indicate it still needs to be reverse-scored.

There are two different approaches to reverse scoring. One is to list each (old value = new value) pair separately:

RECODE dep3 (1=5) (2=4) (3=3) (4=2) (5=1) INTO Rdep3.
EXECUTE.

Here, I am asking SPSS to take the values in dep3 and recode them as I indicated into a new variable Rdep3. I used an “R” to indicate reverse scored - you may use whatever makes sense to you.

We can also use a COMPUTE statement to reverse score the variable:

COMPUTE Rdep5 = 6 - dep5.
EXECUTE.

Why will this work? We took the maximum value from our Likert scale (5 in this case) and added 1 to get 6.

Regardless of the method, you should ALWAYS verify that your recoding worked. An easy way to do this is with a CROSSTABS TABLES command:

CROSSTABS TABLES = dep3 by Rdep3.

If everything went as planned, you should only see frequencies along the diagonal of your crosstabulation table. Another check is to look at the correlations: they should no longer be negative.

7.2.5.1 Using RECODE INTO to change variable type

RECODE INTO is also useful if you want to change a character variable into a numeric, or vice-versa. Let’s say you have a variable diet that has the following values: “Vegan”, “Vegetarian”, “Allergy”, “Gluten-free”, “No restrictions”. We can use RECODE INTO to represent these with numbers rather than strings:

RECODE diet (‘Vegan’ = 1)(‘Vegetarian’ = 2)(‘Allergy’ = 3)(‘Gluten-free’ = 4)(‘No restrictions’ = 5) INTO ndiet.
EXECUTE.

And of course, we would use CROSSTABS to make sure our work was correct:

CROSSTABS TABLES = diet by ndiet.

As a reminder, if we were going the opposite direction, we would need to make sure we used a STRING statement first to indicate we were creating a string variable.

7.3 IF Statements

Another way to recode variables involves the use of an IF statement. This is, of course, not its only use, but it is a good illustration. If we had data about activities seniors engaged in, and we wanted to assign an activity level based on this data, we could use IF statement to accomplish this. Let’s look at this simple dataset:

ID Activity
1 Golf
2 Bingo
3 Cards
4 Bowling
5 Golf

We could use the following IF statements to sort their activity:

IF (Activity = “Golf”) or (Activity = “Bowling”) level = 1.
IF (Activity = “Cards”) or (Activity = “Bingo”) level = 2.
EXECUTE.

And of course we would check our work:

CROSSTABS TABLES = Activity by level.

Which would result in this table:

ID Activity level
1 Golf 1
2 Bingo 2
3 Cards 2
4 Bowling 1
5 Golf 1

The way the IF statements are interpreted is: IF (this condition is satisfied) then do this. For our IF statements above, we are saying that if the condition of (Activity = “Golf”) or (Activity = “Bowling”) is satisfied (either one, since they are separated by “or”), then make the value in level = 1.

Another handy term to use with IF statements is ANY. We could modify our IF statements above to the following:

IF ANY(Activity, “Golf”, “Bowling”) level = 1.
IF ANY (Activity,“Cards”, “Bingo”) level = 2.
EXECUTE.

ANY is saying “Look at this variable (Activity in our case). Then look at the list of options that follow. If any of them apply, then do the next thing.” You can also use AND within an IF statement to indicate all statements must be satisfied to perform the end task.

Some mathematical operators that may be useful in combination with IF statements are in the table below.

Operation Symbol
Not Equal NE or ~=
Greater than or equal to GE or >=
Less than or equal to LE or <=
Greater than GT or >
Less than LT or <

Another use of the IF statement is in recoding system missing values, either from a value to system missing or the other way around (system missing into a value). Let’s use the variable dep1 as an example, and going from a current value of 0 to wanting to recode that to system missing:

RECODE dep1 (0 = sysmis).
EXECUTE.

If we wanted to go the other way, we would do the following:

IF (sysmis(dep1) dep1 = 0).
EXECUTE.

Keep in mind that sysmis cannot be used with string variables.

7.3.1 DO IF

The basic IF command lets you perform a single transformation for the conditions specified in parentheses. DO IF allows multiple transformations for conditions specified within parentheses. Take the following for example (map is a subscale score):

DO IF (sex = “M”).
 COMPUTE Rdep7 = 6 - dep7.
 COMPUTE map = SUM(dep5, dep6, Rdep7).
END IF.
EXECUTE.

This is saying “First, check if sex = Male. If so, do the following. If not, then do nothing.” Notice how a DO IF statement must end in END IF.

We would want to check our work on the above as well by doing a means tables:

MEANS TABLES = map by gender.

Seeing as only males should have that subscale score, we should see that properly reflected in the means table.

7.3.2 ELSE IF

An extension of DO IF is ELSE IF. Think of this as extending the DO IF statement to include another condition.

DO IF (sex = “M”).
 COMPUTE Rdep7 = 6 - dep7.
 COMPUTE map = SUM(dep5, dep6, Rdep7).
ELSE IF (sex = “F”).
 COMPUTE mav = SUM(dep1, dep2, dep3).
END IF.
EXECUTE.

Here, we are keeping the same conditions for the males. Then we are adding ELSE IF to give another condition (sex = “F”), and then another set of commands to execute if that condition is met. SPSS will first evaluate if a case meets the DO IF condition. If not, it will move to the ELSE IF condition. If none are met, then it will do nothing. Importantly, ELSE IF requires a condition to be supplied afterwards, as we did above. There is another option to use if we did not want to do that: ELSE.

7.3.3 ELSE

If ELSE is used by itself after a DO IF statement, then any condition not satisfied in the DO IF or ELSE IF (if present) conditions will be executed by the lines following the ELSE statement.

DO IF (sex = “M”).
 COMPUTE Rdep7 = 6 - dep7.
 COMPUTE map = SUM(dep5, dep6, Rdep7).
ELSE IF (sex = “F”).
 COMPUTE mav = SUM(dep1, dep2, dep3).
ELSE.  COMPUTE maq = SUM(dep8, dep9, dep10). END IF.
EXECUTE.

If we had more than binary gender reported, we could use the code above: do one thing for males, another for females, and something else for all other reported genders. This could also be if we had a categorical variable, we could do different calculations for each level if we wanted. Or, certain calculations for certain levels, and different calculations for others.

7.4 Conditional Execution

Sometimes you may wish to perform an analysis on only a subgroup of cases within your dataset. As an example, you might want to look at descriptive statistics only for those who were noted as having used illicit drugs within the past 30 days. The following commands allow this conditional execution of analyses.

7.4.1 SELECT IF

The SELECT IF command is a potentially dangerous command - it will stay in effect for all procedures that follow. That said, that exact action may be what you want. An example of this may be if you have a ‘consent granted’ column: you only want to examine the data for those subjects that gave consent for their data to be used. You might accomplish this by the following:

SELECT IF (consent = 1).
DESCRIPTIVES VARIABLES = age anx_tot dep_tot.

Running the syntax above would only keep those who granted consent, then get the requested descriptives.

7.4.2 TEMPORARY

There are also times when you may want to select a certain subset of your data, but not permanently. You just want to run a correlation between anxiety and playing games, but only for males. But later you know you’ll want to run other tests on the entire sample. The TEMPORARY command can be used to indicate that you want to select males for just the procedures immediately following it. Not all the procedures for the remainder of the syntax.

TEMPORARY.
SELECT IF (gender = “M”).
CORR VAR = anx_tot games.

DESCRIPTIVE VARIABLES = dep_tot.

In the syntax above, the correlation would only be done on males, because SELECT IF and CORR VAR immediately follow TEMPORARY. However, since DESCRIPTIVE VARIABLES does not immediately follow, it will be done on the entire sample.

7.4.3 FILTER

An alternative to TEMPORARY and SELECT IF is FILTER. This conditional command can be turned on and off. We will first need to create a new variable on which to filter.

COMPUTE male_filter = (gender = “M”).
EXECUTE.

This will create a new variable, male_filter, with values of 1 for males and zero otherwise. Then we can apply the filter.

FILTER BY male_filter.
EXECUTE.
CORR VAR = anx_tot games.
DESCRIPTIVE VARIABLES = dep_tot.

We could keep going for as long as we only wanted to perform procedures on males. When we were done, we would turn the filter off:

FILTER OFF.

After we turn off the filter, any subsequent procedures or commands will be performed on the entire dataset.

7.4.4 SPLIT FILE

SPLIT FILE works in much the same way as SELECT IF….and it’s permanent unless preceeded by a TEMPORARY command! However, there is a difference. Look at the following syntax:

SORT cases BY gender.
TEMPORARY.
SPLIT FILE BY gender.
CORR VAR = anx_tot games.

This would provide the correlation between anxiety and games for both males and females separately. Notice how I also used a TEMPORARY command as to not make the split permanent.