Introduction

SPSS, the Statistical Package for the Social Sciences, is a software package that permits data manipulation and analysis by a variety of statistical techniques.  SPSS is available for both mainframe and microcomputers.  NAU currently has it installed on Unix (Dana & Jan). On Unix, SPSS will run in batch, batch interactive and also in Xwindows mode. In order to run SPSS under X, you must have an X client installed on your local machine.

This document is intended to be a brief introduction to using SPSS in batch mode on the different systems on campus.  You will undoubtedly need to use the SPSS Reference Manual, which is available in the NAU bookstore, to supplement this information.
 
 

USING SPSS

In order to use SPSS, you must build a file which contains the instructions that tell it what you want to do. These instructions are of two types: those that describe your data and any transformations or manipulations you might want to do, and those that describe the statistical analyses or procedures that you want to perform. Please refer to the Editing Raw Data (ASCII) Files document for information on building data files. Building SPSS command files will be discussed next.

All SPSS commands must begin in column 1 of the line and must not go beyond column 72.  A command can be continued on subsequent lines by indenting one or more spaces. On the Windows version of SPSS all commands must be finished with a period. This is optional on Unix.
 
 

DATA DESCRIPTION

The only data description command that is required DATA LIST.  This command is used to inform SPSS what variables you are using and where the data can be found.
 
 

DATA LIST

The DATA LIST command is used to describe how the data is arranged in the file.  The format of the DATA LIST command is:
 

where "fileid" is the name of the file in single quotes, "n" is the number of lines of data that there are for each case (person, observation etc.) in the file, "var1", "var2", etc. are names (up to eight characters each) that you make up for the variables in your file, "colrange" is the columns that the data can be found in, and "format" is an optional description of the format of the data.  The two most common format specifications are (A) for character data, and (n) for data which has decimal places, where "n" is the number of decimal places in the number.

As an example, let's assume that you have some data in a file called STUDY, in the following format:

Columns Item Special Format

An example of what the data file might look like is:

In this example, the person who has ID 001 has a salary of $40.00, and is from Arizona.  ID002 makes $35.50 and is from Pennsylvania.  ID 003's salary is $57.50, and this person is from Wisconsin.

The DATA LIST command to describe this data file would be:
 

If there were two lines of data describing each person in the data file, the DATA LIST command would resemble:

There is a shorthand way to specify your variables if they all have the same number of columns.  The following DATA LIST statement:
 

causes SPSS to assume you have five variables, OPIN1, OPIN2, OPIN3, OPIN4 and OPIN5, each of which is two columns long.
 
 

OPTIONAL DATA DEFINITION COMMANDS

There are three other commands that are often used to describe data.  These are MISSING VALUES, VARIABLES LABELS and VALUE LABELS.
 
 

MISSING VALUES

There are times when some of the data for a given person or case is not available.  In these cases, you will normally record a special value in the appropriate columns to indicate that the data is missing.  The MISSING VALUES command gives you a way to tell SPSS what those special values are, so that missing data will not be included in your calculations.  The format of the MISSING VALUES command is:

where "varlist" is the list of the variables which have missing values and "value list" is the description of what those missing values are.  For example, suppose you have a variable called SALARY, for which a value of -1 means that the data is missing, and two other variables called EDUCATION and SEX, for which 0 indicates missing data.  The command to describe this situation would be:

You can have up to three values that indicate missing values.  Suppose that for variable LEVEL a value of 5 means that the data was missing and 9 means that the response given was invalid.  You would want to exclude both of these values from your analysis, so the command would be:

The simplest form of the MISSING VALUES command is used when one value represents missing data for all variables.  If 0 means, for all variables, that the data is missing, the command would be:

The value you select to indicate that the data is missing must always be a value that would not normally occur in the data.  If you are recording temperatures, for example, 0 would not be a good missing value indicator, because 0 can be a valid temperature.  In this case, you would use a value such as -99, which is not likely to occur.
 
 

VARIABLE LABELS

Because you are limited to eight characters when you are making up variable names to describe your data, these names are sometimes not very descriptive.  The VARIABLE LABELS command allows you to enter descriptions of what the variable name mean.  The SPSS will print these descriptions on the various reports it creates.  The format of the VARIABLE LABELS command is:

For example, suppose you had two variables, SALARY and STATE, which represent the respondent's weekly salary and state of birth.  You could clarify these names by using VARIABLE LABELS of up to 40 characters, as follows:

Note that these two labels would not have to be on separate lines, this was only done for clarity.
 

VALUE LABELS

In some procedures, such as frequencies, each value that a variable can take on is printed out.  Often, these values are intrinsically meaningless.  For example, for a variable such as SEX, the values will normally be recorded as MALE or FEMALE.  The VALUE LABELS command allows you to specify what the values for a variable mean.  The format of this command is:

where "varlist" is the names of the variables to which the following "values"  and "labels" apply.  For example, assume that you have five variables, OPIN1, OPIN2, OPIN3, OPIN4 and OPIN5, for which the values are '1' -- Not important at all, '2' -- Somewhat important and '3' -- Very important.  The VALUE LABELS command to describe this would be:

The MISSING VALUES, VARIABLE LABELS and VALUE LABELS command, then, although not required, allow you to describe your data more fully.
 
 

DATA MANIPULATION COMMANDS

There are times when you will need to change your data around or only consider a portion of it in order to do your analysis.  There are several data manipulation commands which are designed to allow you to do just that.
 
 

RECODE

The RECODE command allows you to change the value of variables in your file. This change is only is effect for the duration of your run; no permanent changes are made to your data.  The format of the RECODE command is:

For example, suppose that you have recorded your subject's age in years.  You then decide that it would be more valuable to look at age by decade.  The command to do this transformation would be:

When the RECODE has been done, all the values for variable AGE will have been recoded as 1, 2, 3, 4, etc.  Again, this transformation only remains in effect until you are done with your processing.
 
 

COMPUTE

The COMPUTE command allows you to create new variable by doing calculations.  The format of this command is:

For example, let's assume that you have recorded a person's test results for last year and this year, and you want to do some analysis on the net gain (or loss ) between the two tests.  If the variables for the two tests are CTEST for the current year) and LTEST (for last year), the command to calculate the difference would be:

After this command is executed, the variable GAIN will contain the difference between this year's and last year's test scores.  You can do very complex calculations using the COMPUTE statement, and also incorporate some mathematical and statistical functions into your calculations if you desire.  See the SPSS Reference Manual for more information.
 
 

CONDITIONAL PROCESSING

There are times when you need to do different calculations in different situations.  In this case, you need to use the IF command to condition the calculations.  The format of this command is:

where "logical expression" is the condition that is controlling your calculations.  For example, let's assume that in the previous example, we are only interested in the absolute value of the difference between the two test scores.  There are several ways to handled this situation.  One of them is:

You can use the logical operators EQ, LT, GT, NE, LE, and GE to make comparisons between the two items in your statement.  You can also use AND, OR or NOT to make a compound conditional so that you can check for several conditions in your IF statement:

In this case, GAIN will only be calculated as CTEST - LTEST if CTEST is greater than or equal to LTEST, and LTEST is greater than 0.

For complex conditions, you can use a more formal version of the IF statement, the DO IF.  The format of the DO IF is:

We could use the DO IF to do a much cleaner calculation of GAIN than that which was illustrated above:

The SELECT IF command allows you to exclude certain of your cases from the analysis procedures which follow.  The format of the SELECT IF is:

For example, suppose you had a variable called SEX, for which 1 represented male subjects and 2 represented females.  If you wanted to do some type of analysis just for male subjects, you would use the SELECT IF command:

The logical expression portion of this command is the same as that in the IF command.
 
 

TEMPORARY ALTERATIONS

There are times when you want certain calculations to be done for just a portion of your run.  The TEMPORARY command, if inserted before any of these data manipulation commands, will case them to be in effect only for the next procedure you do.  The format of this command is:

For example, if you wish to exclude females from your analysis only for one procedures, and if the code for female was 2, you would put a TEMPORARY command in before your SELECT IF:

Any commands between the TEMPORARY command and the description of the next procedure to be run will be treated as TEMPORARY.  Thus, you can do several transformations and/or recalculations that will have no permanent effect on the data or the procedures that follow.
 
 

PROCEDURE COMMANDS

Once you have put together all your data description and data manipulation commands, you then must put in the instruction that tell SPSS what type of data analysis you want to do.  SPSS provides a variety of statistical procedures such as FREQUENCIES, CROSSTABS, ANOVA and REGRESSION to do command types of data analyses.  This handout covers three simple procedures:  DESCRIPTIVES, FREQUENCIES and CROSSTABS.
 
 

DESCRIPTIVES

The DESCRIPTIVES procedure produces simple descriptive statistics about selected variables.  The format of this command is:

Like all SPSS procedures, DESCRIPTIVES has certain options that you can use to modify the way the procedure is done, and allows you to require certain statistics beyond those that are provided by default.  These selections are made with the options which follow the command line.  For example, suppose you want to produce all statistics for variables SEX, RACE and INCOME82.  The commands required to do this are:

The SPSS References Manual contains a list of all the statistics that are available.  The mean, standard deviation, minimum and maximum values are the default statistics procedure by the DESCRIPTIVES procedure.  If these are the only statistics desired, no STATISTICS option is required.
 

FREQUENCIES

The FREQUENCIES procedure produces a list of how many responses there are for each value in the specified variables.  FREQUENCIES is often used to do some data verification.  The format of the FREQUENCIES command is:

The default statistics are mean, standard deviation, minimum and maximum.  See the SPSS Reference Manual for a list of other statistics and options.

In order to obtain a FREQUENCIES printout of the variables SEX and INCOME82, along with the mean, standard deviation and variance, the commands would be:

CROSSTABS

The CROSSTABS procedures is used to create tables of responses:  to allow answers to questions such as "I wonder how many women in my survey are college graduates?"  CROSSTABS lets you look at the responses to variables in conjunction with other variables.  The format is:

There are many option and statistics that control the information that is printed in your table and on the report.  See the SPSS Reference Manual for a complete list.

Let's say that you would like to see how the variable SEX affects income in the work place, which you have recorded as variable INCOME82.  The command to get a table of these responses is:

In this case, a chi-square statistic is requested.


 

SYSTEM FILES AND WRITING NEW RAW DATA FILES

There are a number of procedures available in SPSS that will allow you to list out your variables for each case and also save a permanent data set.  The following sections will show you how to do that.
 
 

SAVE

The save command is used to create a permanent SPSS system file.  A system file contains all of the variables, missing values, variable labels, value labels and any other information entered on the data.  System files are valuable because they allow you to create smaller command files that will execute in less time.  The syntax of the SAVE command is:

where the KEEP option allows you to specify what variables you want to keep in the system file.  If you do not include the KEEP option all of the variables will be saved.  Outfile refers to the file that you want to create.  The following example illustrates this process on the Unix System:

A Unix file named myfile.sav will be saved on disk. Once you have saved a system file you do not need to use your raw data again, unless changes have to be made to the raw data file.  In order to access this new file you will use the GET command.  Like the SAVE command, you must indicate the name of the save file on the command.  The following details the use of GET:

where filename is the name of the file saved in the SAVE command.
 

WRITE

The write command is used to produce a new raw data file that can be read by SPSS or other programs.  The following is the syntax for the write command:

where outfile is the name of your output file.  For more information on these commands, please refer to your manual.
 
 

RUNNING SPSS

Once you have prepared your command file, the next step is to get SPSS to execute it.  This command is SPSS.  At the operating system prompt enter:

where "command-file" is the name of the file contains all your data definition, data manipulation and procedure commands.

When you give this command, the output will come to your terminal so you can see if you have any syntax errors in the commands.  By default on Unix, SPSS does not produce a listing file. You will need to include an option on the command to create one (see below).

 

 


SPSS under Unix

This document addresses some of the issues involved in running SPSS under the Unix operating system.  It is not a replacement for your SPSS manual, but should be used as a supplement to it.  Since SPSS runs interactively and in batch mode under Unix, each mode will be discussed separately.
 
 

BATCH INTERACTIVE EXECUTION

The standard syntax for starting SPSS in batch interactive mode is:

Where commandfile.sps is a file consisting of the SPSS commands that you wish to execute (if a command file name is not included, you will be prompted for commands).  The commands included in the file must be valid SPSS commands and must begin in column 1.  Lines not starting in column 1 are considered to be a continuation of the previous line.

By default, SPSS on Unix does not create an output file. All outpuyt is sent to the screen. If you need to create an output file for later printing you must include an output option on the command line:

spss -t outputfilename.lis commandfile.sps

where outputfilename.lis is the name of the file that will contain the output from the SPSS run.
 

BATCH EXECUTION

Batch execution requires extra steps than running a batch interactive job.  The main advantage to batch execution is that you may submit large jobs for processing while you do something else.  Batch is especially useful whenever you have a lot of processing to do. 

In order to run an SPSS job in batch mode you simply append an ampersand (&) to the end of your command line. For example to run the command file myfile.sps and create an output file in batch mode you would issue the following command:

spss -t myfile.lis myfile.sps &

The file myfile.lis will contain all the output from the job once the job completes.



 
SPSS Portable Files
 

To move SPSSx system files from one machine to another, requires that the SPSS system file be converted to a 'portable' format. This is because differences in diverse computer systems' architectures make system files incompatible. The SPSS commands for porting these files to other computer systems will allow for the retention of all data and formatting (i.e. missing values, value and variable labels, etc.) on the new system.
 
 

Making Portable Files

Once SPSS system files have been identified, they must be converted to portable files before moving them to the new system. First, the file must be retrieved with the SPSS GET command, then it must be exported with the EXPORT command. The following script demonstrates the use of these commands:

where spssx_system_file is the name of your SPSS system file, and portfile is the name of the portable file.

Note: The name following OUTFILE cannot be longer than eight characters. Also, certain characters are illegal, including the commonly used colon (:). It is safest to use only alphabetic characters to name portable files.

The following example shows the process of converting an SPSS system file named myspssx to a portable file named myportx:

These commands can be put into a command file which then can be executed with SPSS. If the two commands above were put into a file named spssxp, they could be executed as follows:

The resulting file, myportx, may now be moved to another computer system.
 
 

Importing an SPSS Portable file

Importing the system file is just as simple as making the portable version. Once the portable file is moved to the new computer system (by tape, or file transfer), it may be processed by SPSS. The IMPORT command is used to read a portable file. The data may then be saved to a system file with the SAVE command. The commands are used as follows:

where port_file_name is the name of the portable file that has been moved to the new system, and spssx_system_file is the name of the SPSS system file to be created from it. Note that filenames are surrounded by single-quotes.
 

If you have any questions about working with SPSS portable files, call Academic and Computing Services at x1511.