Introduction

The SAS system for Unix and the Windows is available at NAU. These versions of SAS are similar, although they do have some important differences.  These differences will be outlined throughout this and the machine specific documents.  This document is intended to be a brief introduction to using SAS on the different systems on campus.  In order to use the full potential of SAS, you should acquire and read the SAS Reference Manuals, which are available on the main NAU Stats Web Page.  Manuals may also be ordered through the NAU Bookstore.

SAS runs in two different modes, batch (non-interactive) and interactive (under Windows or X on Unix).  Batch mode allows you to issue SAS commands from a file.  This file may contain your data and should contain all of the statistical commands that you want to run on your data.  To start a SAS batch job type:

where command-file is a file consisting of the SAS commands that you want to execute.

This document only covers execution in batch mode. You will need to know how to use the Unix operating system to use SAS in batch mode.
 

The Data Step

The data step is used to describe your data to the SAS system.  It is also used to create new variables from your raw data.  The following is a sample data step: The header line for the data step begins with the key word DATA. The next item after data is the name of the dataset that you will be creating.  This is a temporary data set and will be deleted after the run completes.  Options are supplied in parentheses at the end of the DATA statement.  In this example, a label for the run is included.

In the data step, the data file is specified by the infile statement. in the example given above gss should be replaced by the actual file name that contains the data. An example of this on Unix is as follows:

Note that each statement (command, not line) in SAS must be terminated with a semi-colon (;).  Please refer to the machine specific documentation for a description of how to use the infile statement.

The input statement is used to specify the variables on the input record.  This statement is used to control how the data is to be read in by SAS.  In the example above no special controls are given.  The data points in this example are separated by spaces and so will be read in, one variable at a time.  If your data does not have spaces between the data points (variables) then special formatting options would need to be included.  The standard format for input is as follows (anything in brackets is optional):

where variable indicates the name of the variable to be input, $ indicates that the variable is character instead of numeric (the default), start column is the column where the variable begins, endcolumn is the column where the variable ends and .decimals is the default number of decimal places for the variable.  Data values may be either numeric or character.  Examples of character data include a person's name, a state or country name.  In our example since the data is blank separated, we do not need any special format controls.

It is also possible to read in multiple records per case. The forward slash "/" is used to tell SAS to advance to the next line. For example if the raw data has two records per case with ID, SEX and AGE on record 1 and questions one through five on record 2, we could use the following command to read the data correctly:


 

Missing Data

Missing data in SAS is normally represented by either a space or a period in your raw data set.  If you are using list input (no column or special input control) you will need to use a period to indicate a missing value.  If your missing values are a special coded number (i.e. a 9) you can use the if statement to indicate to SAS that it represents a missing value: You may also use the MISSING statement to indicate special missing values.  This will allow you to have special character codes to represent specific types of missing values.  For example, if you are performing a survey and you want to track how many people refused to answer a question versus how many were not home you could use the following: R and X are now special missing values in the raw data set.
 

Creating and recoding variables

Creating new variables under SAS is a very simple procedure.  SAS allows standard algebraic equations to be used in the Data Step. For example, if you have the radius of an object in your input and want to calculate the circumference as a new variable, you could use the following: All of the standard arithmetic operators are included and also many comparison operators (please refer to your SAS manual for a full explanation).  Comparison statements may also be used to create new variables.  If your data set has the day of the week as a character string and you wish to create a numeric code for each day you could try the following: An IF statement would be necessary for each day of the week.  You may also use the IF statement to collapse or recode your data.  Here is an example for producing an ordinal range from an interval scale:
 

Writing Raw Data Files

Creating a new raw data file from input data will be slightly different on each of the SAS systems.  In general you will need to use two statements to create the new data file.  The first statement is the FILE statement.  This statement is used to indicate the file reference (fileref) for the output file.  The second statement is the put statement.  The put statement "puts" an output record to the file referenced in the FILE statement.  If you want to create a subset of your whole dataset, the if statement used in conjunction with a put may be used. The following is an example: This will create a new file containing the variables id, race, age, educ and income82 for all cases where sex equalled a 2.
 

Creating SAS Data Sets

A SAS data set is a special data file that SAS creates in its own special format.  The file contains all of your data and information(variable names, missing values, labels etc.) about your data.  A SAS data set is created whenever you issue a DATA step.  The fileref given after the DATA statement is the name of the data file that you will create.  This file by default is temporary and will be deleted when the program completes.  To create a permanent file you must use a two level file name.  A two level file name is where you have two names separated by a period (i.e. food.prices).  For machine specific information on creating permanent SAS data sets please refer to the your SAS manuals. The following commands will create a new SAS data set: Once a permanent SAS data set has been saved, the SET command is used to access the SAS data set.  For example, if you have a SAS data set named GSS.D84, to access that data you would use the following:
 

Producing Frequency Tables


The Proc Freq procedure is used to produce one-way to n-way frequency and crosstabulation tables.  PROC FREQ will produce percentages and frequency counts, chi-squares, Fisher's Exact, PHI and Cramer's statistics.  The procedure has the following format: where varlist is the list of variables to use.  The BY statement allows you to group your output by the variables listed.  One-way tables are produced by listing the variable names separated by spaces. Figure 3 shows sample output for a one-way table.

To obtain a multiway table an asterisk (*) is used to separate the variable names.  This produces a crosstabulation. Figure 4 shows output from the following command:


 

Producing Correlations

Correlations may be produced using PROC CORR.  PROC CORR also is used to produce certain univariate statistics like standard deviation, means and sums.  The format for the CORR procedure is: where varlist is the list of variables to include in the correlation.  You may also use the BY statement to group your output.  FIgure 5 shows ouput from the following command:

Descriptive and Univariate Statistics

The following is a list of the statistics produced by PROC UNIVARIATE: mean, sum, standard deviation, variance, skewness,kurtosis, sum of the weight, maximum, minimum, range, quartiles,percentiles, median, mode, signed ranks, and tests for normality. The format for this procedure is: where varlist is the list of variables to test.  You may also group the output with the BY statement. Figure 6 shows output for the following command:
 

Labeling Output

The following SAS command file shows an example of labeling your output.
 
proc format;
 value abnum 1 = 'yes' 2 = 'no';
 value $abchar 'h' = 'hi' 'g' = 'goodbye';
run;
data a1;
 input a b $;
cards;
1 h
2 g
3 h
4 i
;
run;
proc freq;
 tables a*b;
 title 'test of numeric and character value transformations';
 format a abnum.;
 format b $abchar.;
run;


The following shows sample output from the above commands:

test of numeric and character value transformations
The FREQ Procedure
Frequency 
Percent 
Row Pct 
Col Pct 


Table of a by b
a b Total
goodbye hi 
yes 0
0.00
0.00
0.00
1
25.00
100.00
50.00
0
0.00
0.00
0.00
1
25.00
 
no  1
25.00
100.00
100.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
1
25.00
 
3 0
0.00
0.00
0.00
1
25.00
100.00
50.00
0
0.00
0.00
0.00
1
25.00
 
4 0
0.00
0.00
0.00
0
0.00
0.00
0.00
1
25.00
100.00
100.00
1
25.00
 
Total  1
25.00
2
50.00
1
25.00
4
100.00


Conclusion

Each machine that SAS runs on has specific commands and file access methods that are unique to that machine. Documentation is available for the systems on campus. If you have any questions, please call Academic and Computing Services at x1511.