BMDP

BMDP is a robust statistical package that has been written to run with less-than-perfect data. It will handle missing data, outliers and non-normal information. This statistical package is available for a number of machines including Unix, Vaxes, and Personal Computers. At NAU, BMDP is accessible on Unix (Dana & Jan).

In order to run a BMDP job you must have a data file and also a command file that consists of the instructions to BMDP on what to do with your data. You may refer to the Editing ASCII Data document for information on building data files. The following information details getting started with building command files for BMDP.
 
 

BMDP instructions are based on easy to use English like commands. These instructions are grouped as paragraphs and sentences. Each paragraph consists of a leading slash ("/") and one or more sentences. A sentence is a qualifier in a paragraph that helps to clarify the action that is to be taken. All sentences end with a period. Each statistical run may be performed with a basic set of paragraphs(1):

INPUT
this paragraph describes how data is formatted in your input file.
VARIABLE
this paragraph allows you to name the variables in your data set.
GROUP
this paragraph allows you to group your variables on specific criteria.
END
this paragraph instructs BMDP to complete the statistical run.
All BMDP statistical runs will use the INPUT, VARIABLE and END paragraphs. The GROUP paragraph is optional. Figure 1 shows a sample instruction set. Certain rules should be followed when writing BMDP instruction sets. These include: To start BMDP, please refer to your machine specific documentation. The general syntax for starting BMDP is: Where 1d is the statistical routine (program) to run, myprog.bmd is your instruction set and myprog.lst is the name of a file to write the output to.

The following is a list of some of the BMDP programs and the type of statistical analysis that they perform:

The INPUT Paragraph

The INPUT paragraph is used to describe your data to BMDP. Three input commands are essential for a working run:
Variables
used to specify the number of variables for each case
Format
to specify the layout of the data values on each record
File or Unit
used to identify the location of the data file on disk
For example, if you have a file on disk called mydata.dat and you have three variables per case in the file, you may use the instructions shown in Figure 2 to access the file. The following optional commands may also be used:
Title
specifies a title for the problem
Case
specifies the number of cases to read in
Multiple
allows the program to read in multiple records per case

 
 

The VARIABLES Paragraph

This paragraph is used to describe your variables to BMDP. There are a number of commands that will be useful in doing so. You should familiarize yourself with the following:
Name
used to assign names to your variables.
Use
used to select a subset of your variables for analysis.
Missing
Missing value codes may be assigned to your data set.
Max, Min
used to restrict the range of data used in an analysis.
Label
allows the labeling of cases in a data set.
Figure 3 shows a sample of the VARIABLES paragraph. It lists three variables; id, sex and income. An example of setting up a missing value for sex is also given. Some procedures will require that your data be grouped or sorted before the run may be accomplished. The group command may be used in the VARIABLE paragraph to accomplish this.
 
 

Transforming your Data

The TRANSFORM paragraph is used to create new or transform old variables. The use of arithmetic operators, BMDP arithmetic functions, summary functions and conditional statements will allow you to manipulate your data. The general syntax of a transform sentence is an algebraic equation. The variable name given on the left side of the equal sign is assigned the value specified or computed on the right hand side of the equation. For example, if you are working with student test scores and want to produce an overall score from a verbal and math test score you could use the following: For a full explanation of the use of algebraic equations in BMDP, please refer to your manual.

You may also conditionally process variables in the TRANSFORM paragraph. The if - then statement is used to do this. For example, to transform an interval variable to ordinal you could do the following:

In this example, whenever a value of 4000 is seen in the variable income, it is changed to a 4. You may also recode non-numeric variables to numeric using the if statement. For example, to recode sex from non-numeric to numeric you could use the following: This will change all occurrences of M to 1 and F to 5 in the variable sex.

To work with temporary variables, you must use the add statement in your VARIABLE paragraph. You may then calculate new variables in the TRANSFORM paragraph.
 
 

Producing Descriptive Statistics

The 1D program provides standard descriptive statistics. The necessary paragraphs include INPUT and VARIABLE. In the VARIABLE paragraph you should include the use command to list the variables that you wish to include in the statistical run. 1D prints out the following stats: mean, standard deviation, standard error of the mean, coefficient of variation, z-score and range. Figure 4 shows a sample 1D output.

VARIABLE      TOTAL                STANDARD    ST.ERR   COEFF. OF    S M A L L E S T     L
 A R G E S T
NO. NAME      FREQUENCY      MEAN  DEVIATION   OF MEAN  VARIATION    VALUE   Z-SCORE    VALUE   Z-SCORE    RANGE
                                          &nb
sp;              
  2 race       1473          1.186     0.471    0.0123   0.39742     1.000    -0.39  
    3.000    3.85      2.000
  4 sex        1473          1.594     0.491    0.0128   0.30818     1.000    -1.21 &
nbsp;    2.000    0.83      1.000
 15 income82   1473         16.510    20.016    0.5215   0.21239     1.000    -0.77     99.000  
;  4.12     98.000

Figure 4.
 
 

Frequency Tabulations

At times it is advantageous to produce a frequency distribution for variables. This is performed by the 2D program under BMDP. Like the 1D program, the INPUT and VARIABLE paragraphs are required to produce the run. Figure 5 shows output from the 2D program.

PAGE   4  BMDP2D     23-AUG-89           08:24:59       

  ************
  * income82 *                                MAXIMUM       
 99.0000000
  ************                                MINIMUM       
  1.0000000
                                          &nb
sp;   RANGE          98.0000000                  H           
;                 
VARIABLE NUMBER . . . . . .       15          VARIANCE      400.6536560              
    H                            
NUMBER OF DISTINCT VALUES .       20          ST.DEV.        20.0163345            &n
bsp;     H                            
NUMBER OF VALUES COUNTED. .     1473          (Q3-Q1)/2       3.5000000               
;   HH                              EACH 'H' 
NUMBER OF VALUES NOT COUNTED       0          MX.ST.SC.       4.12              
        HHH                             REPRESENTS
                                          &nb
sp;   MN.ST.SC.      -0.77                      HHH          &nbs
p;                      56
                                          &nb
sp;                                          
  HHHH                              COUNT(S)
                                          &nb
sp;       95% CONFIDENCE                       HHHH          &nbs
p;                
                  ESTIMATE       ST.ERROR      LOWER          UPPER&nb
sp;                   HHHH               H         
;  
MEAN            16.5098438      0.5215347     15.4868135     17.5328751          &nbs
p;     HHHH               H           
MEDIAN          13.0000000      0.2886753                        
                     L-------------------------------U
MODE            15.0000000
                                          &nb
sp;                                          
         EACH '-' ABOVE =       5.0000
                                          &nb
sp;                                          
                       L=       0.0000
                                          &nb
sp;                                          
                       U=     155.0000
                                          &nb
sp;                                          
  CASE NO. OF MIN. VAL. =  146
                                          &nb
sp;                                          
  CASE NO. OF MAX. VAL. =  304

                                          &nb
sp;                                          
                              Q1=    9.0000000
                                          &nb
sp;                                          
       VALUE   VALUE/S.E.     Q3=   16.0000000
                                          &nb
sp;                               SKEWNESS           3.
62        56.66     S-=   -3.5064907
                                          &nb
sp;                               KURTOSIS          11.97&nb
sp;       93.77     S+=   36.5261803


                                          &nb
sp;                                          
         EACH '.' BELOW =       1.0000
             S            Q      Q             &nb
sp;     S                                     &nbs
p;                                      
             -    M       1   M MM                 &nbs
p; +                                          
;                    M             
             .....I...........E.OE..................................................................................A             

                  N           D DA              &n
bsp;                                          
;                         X             
                              I EN            &nbs
p;                                          &
nbsp;                                        

                      PERCENTS                    
    PERCENTS                        PERCENTS             &nb
sp;          PERCENTS 
     VALUE    COUNT  CELL   CUM      VALUE    COUNT  CELL   CUM      VALUE    COUNT  CELL   CUM
      VALUE    COUNT  CELL   CUM
           1.    20   1.4   1.4            6.    30   2.0  13.8   &nbs
p;       11.    89   6.0  40.1           16.   168  11.4  83.0 
           2.    32   2.2   3.5            7.    41   2.8  16.6   &nbs
p;       12.    65   4.4  44.5           17.   120   8.1  91.1 
           3.    44   3.0   6.5            8.    55   3.7  20.4   &nbs
p;       13.   108   7.3  51.8           18.    51   3.5  94.6 
           4.    31   2.1   8.6            9.   104   7.1  27.4    &nb
sp;      14.    85   5.8  57.6           98.    78   5.3  99.9 
           5.    47   3.2  11.8           10.    97   6.6  34.0     &n
bsp;     15.   206  14.0  71.6           99.     2   0.1 100.0

Figure 5.
 
 
 

Crosstabulations

Program 4F produces multiway frequency tables that help in summarizing categorical data. Statistics including mean, standard deviation, frequency of values, and percent of missing is printed for each variable in the table. However, there are a number of limitations and options that you will need to learn to effectively use this procedure. By default, 4F can only handle 10 distinct values per variable. By the use of the codes and cutpoints commands (CATEGORY paragraph), these limitations can be overcome. Please refer to your manual for a full explanation of these commands. Other than the standard INPUT and VARIABLE paragraphs, you will need to include the TABLE paragraph and optionally the CATEGORY paragraph. The TABLE paragraph is used to set up the columns and rows for your tables. For example, if you have two variables, sex and degree, and wish to produce a crosstabulation of those two variables you could use the following TABLE paragraph: This command will produce the output shown in figure 6.


PAGE   4  BMDP4F     23-AUG-89           08:36:37       


   VARIABLE        STATED VALUES FOR            GROUP CATEGORY    INTERVALS
  NO.   NAME    MINIMUM MAXIMUM MISSING   CODE  INDEX   NAME     .GT.   .LE.
 ---- --------  ------- ------- -------  ------ ----- --------  ------ -------

   4  sex                                 1.000    1  *1&nbs
p;     
                                          2.000&nb
sp;   2  *2      

  11  degree                              0.000    1  *0    
  
                                          1.000&nb
sp;   2  *1      
                                          2.000&nb
sp;   3  *2      
                                          3.000&nb
sp;   4  *3      
                                          4.000&nb
sp;   5  *4      
                                          9.000&nb
sp;   6  *9      

NOTE: CATEGORY NAMES BEGINNING WITH * WERE CREATED BY THE PROGRAM.
------------------------------------------------------------------------------

    ************************
    * TABLE PARAGRAPH   1  *
    ************************

 *****  OBSERVED FREQUENCY TABLE  1                                  &
nbsp;                  

 degree       sex       
 ------       ------    
                1        2    TOTAL     
 ------------------------------------   
 0            162      238 |    400     
 1            290      474 |    764     
 2             15       39 |     54     
 3             80       95 |    175     
 4             49       28 |     77     
 9              2        1 |      3     
 --------------------------|---------   
 TOTAL        598      875 |   1473     

         ALL CASES HAD COMPLETE DATA FOR THIS TABLE. 



  MINIMUM ESTIMATED EXPECTED VALUE IS      1.22

  STATISTIC                     VALUE    D.F.   PROB.
  PEARSON CHISQUARE            25.581       5  0.0001

NUMBER OF INTEGER WORDS OF STORAGE USED IN PRECEDING    PROBLEM    2006
CPU TIME USED      4.770 SECONDS

Figure 6.
 
 

If you have any questions about using BMDP at NAU, please call Academic and Personal Computing at x1511.
 
 

Using BMDP on Unix

You will use the following syntax to start the BMDP program: where program_name is the name of the BMDP program/module that you wish to use (e.g., 1D), input_file is the file containing your BMDP instructions (the default is terminal input) and output_file is an optional file name for BMDP to write your output to.

 Example:

The input file 7d.bmd is used to control the run and the output is written to the file 7d.lst.

Bimed may also be run interactively.  To do so you type in bmdp at the unix prompt followed by the name of the statistical module that you wish to use.  For example to start Bimed using the 1D module, you would use the following command:

When you are queried for the instruction language file and the output file name simply type the enter or retrn key for each question.


Notes:
Where one of the above examples shows text enclosed in square brackets ("[ ]"), this is an optional item.  If it is enclosed in angle brackets ("< >"), you must supply the necessary information.
1. Not all paragraphs and sentences available for your use will be discussed in this document. You should refer to your BMDP manual for a full description of any commands that are not fully described in the ITS documentation.