SAS E-Learning
SAS E-Learning and SAS Online Courses r available contact to absas121@gmail.com
Wednesday, 28 November 2012
Friday, 23 November 2012
SAS Programming
SAS
PROGRAMMING
Many businesses and individuals need to
analyze data in order to make better decisions.
As businesses become more complex, there is more information and it
needs to be examined. Sometimes students
must do research projects where they need to collect information in order to
analyze it.
The amount of data is increasing
daily. A newer development is the
Internet and e-commerce. Companies that
do business on-line want to collect information to see what type of people use
their web site. They can also look at
how people use their Web site in order to design their information easier. They also have to provide round the clock
service. They need software to help them
analyze their data.
In 1976, The SAS Institute Inc., a privately
held corporation was formed. The product
at that time was known as the "Statistical Analysis System." It grew in popularity and capability and was
used in academic groups. These people
needed a software package that would do statistical calculations easily. They were not necessarily programmers. SAS can be used without knowing much about
programming but it is also a very sophisticated language and more can be done
with it.
It has grown into the world's largest
privately held software company. Continual product line expansion and
diversification of clientele have resulted in SAS products being used by over
40,000 customer sites in 50 countries.
There are 3.5 million users of
SAS products. Part of the reason for
the continual growth is that the SAS Institute works with the end user to
improve its product. It offers solutions
for data warehousing, data mining, data visualization, and applications
development. SAS now stands for the
SAS System. (1)
SAS is used in many different types of businesses
including banking, manufacturing, government, insurance,
telecommunications/utilities, sales and services and healthcare. (1)
SAS
is located in Cary , North Carolina . It is a world-wide company with business in
Asia, Pacific and Latin America, Europe, Middle East and Africa .
SAS also has a good employee retention rate of 96%. It also is a family oriented company and is
friendly to working women (1).
1. This information was obtained from the SAS
web site at http://www.sas.com,
The SAS System is an applications system
that can be used as
1). a statistical package,
2). a data base management system and
3). a high level programming language.
When people want some kind of
information, they usually start with an application for data. An applications system is software that gives
you the tools you need to make the data useful and meaningful.
In order to be useful, an applications
system should give you
1). total control of your data,
2). facilitate applications that run in
more than one computing environment, and
3). accommodate varying skill levels of
potential users.
SAS
can do all of these.
Some types of data that may be collected
are:
* Payroll and employment data
* Student data and class data
* Research data
* Medical data
* Inventory data and sales data
* Web data on customers
* Areas such as physical science, social
science, business, agriculture
With any body of data, you must perform
four basic tasks to make it useful and meaningful. You can:
ACCESS
-- First, you access the data through the SAS system
MANAGE -- Update, rearrange, combine,
edit, or subset data before analyzing
ANALYZE --Ranges from
simple descriptive statistics to more advanced or specialized analyses for econometrics and forecasting, statistical
design, computer performance evaluation, and operations research
PRESENT --Presentation
capabilities range from simple list and tables to multidimensional plots to
elaborate full-color graphics, both on paper and on your display.
SAS is also portable across computing
environments. A computing environment is
determined by the HARDWARE and the host OPERATING SYSTEM running
it. SAS can be used on IBM mainframes,
UNIX based machines, on personal computers using Windows.
PORTABILITY means that SAS
applications:
* Function the same
* Look the same
* Produce the same results
on mainframes, minicomputers, or
microcomputers.
You
can develop SAS applications in one environment and run them in other
environments without rewriting the programs.
SAS is a powerful programming language
which has a collection of ready-to-use programs called procedures. It can give an unlimited variety of
applications--from general purpose data processing to highly specialized
analysis in diverse applications areas.
INTRODUCTION
TO PROGRAMMING
USING
BASE SAS SOFTWARE
The SAS System is a software system
composed of computer programs that work together to perform specific
tasks. The system reads data, such as
letters or numbers, in various forms and organizes them in a SAS data set
or Table. Today the use of Table
is used instead of data set because it is more consistent with the
relational databases that are used today.
Both of these names are used interchangeably. In these notes, we will try to use Table
instead of data set.
A Table stores data in a form the
system can identify and manage as a unit.
Once data is organized into a Table, you
can access, analyze, revise, and display the data using one computer
program. You do not need to prepare
separate programs for different tasks.
These
are the parts of a Table:
DATA
VALUE
-- A single unit of information, such as a person's height. Each of the items recorded is a data value.
COLUMN
or Variable
-- Set of data values that describes a specific characteristic, for
example heights of all individuals in a group.
The age values make up the AGE column. These used to be called
Variables. Column will be used in the notes.
SAS data types are classified as CHARACTER
or NUMERIC.
CHARACTER columns contain data values
consisting of:
* Combination of letters of the alphabet
* Numbers (such
as an id number or a zip code) (These are not used in calculations)
* And special characters or symbols
NUMERIC columns contain:
* Numbers and related symbols, such as
decimal points, plus signs, and minus signs
A
TABLE
Column (or
Variable)
9
|
NAME
|
SEX
|
AGE
|
STWGT
|
ENDWGT
|
HEIGHT
|
TEAM
|
|
Charlene Armstrong
|
F
|
35
|
152
|
139
|
66
|
Yellow
|
|
David Shaw
|
M
|
27
|
189
|
165
|
68
|
Red
|
|
Amelia Serrano
|
F
|
50
|
145
|
124
|
65
|
Yellow
|
|
Ann Nance
|
F
|
31
|
210
|
192
|
72
|
Red
|
|
|
M
|
48
|
194
|
177
|
67
|
Yellow
|
|
Ashley McKnight
|
F
|
26
|
127
|
118
|
62
|
Red
|
|
Jim Brown
|
M
|
41
|
220
|
|
73
|
Yellow
|
|
Susan Stewart
|
F
|
29
|
135
|
126
|
63
|
Red
|
|
Rose Collins
|
F
|
37
|
155
|
141
|
67
|
Blue
|
|
Jason Schock
|
M
|
28
|
187
|
172
|
77
|
Green
|
|
Kanoko Nagasaka
|
F
|
46
|
135
|
126
|
63
|
Blue
|
|
Richard Rose
|
M
|
33
|
181
|
166
|
72
|
Green
|
|
David Sims
|
M
|
50
|
280
|
300
|
70
|
Blue
|
|
Elizabeth Sims
|
F
|
48
|
300
|
200
|
65
|
Green
|
|
Tim Jones
|
M
|
35
|
280
|
168
|
70
|
Blue
|
|
Larry Goss
|
M
|
21
|
188
|
174
|
73
|
Green
|
|
Asha Garg
|
M
|
56
|
148
|
132
|
61
|
Yellow
|
|
Jennifer Brooks
|
F
|
42
|
208
|
165
|
72
|
Red
|
8 8
Row (or Observation) Data Value
COLUMN NAMES can contain:
In the older versions of SAS, column
names could only be 1 to 8 characters long. But with Version 8 of SAS (which is
what you will be using in class), the rules have changed.
* 32 or fewer characters in length
* MUST begin with a letter or underscore
(_)
* Subsequent
characters must be letters, numbers, or underscores (Do not use %$!*&#@)
* BLANKS CANNOT be used in column name
* Select descriptive names that reflect
the contents of each set of data values
* Names can contain upper and lowercase
letters
ROW or
Observation--
is a set of data values for the SAME ITEM, for example all physical
measurements for one person. There are
18 rows in our data set above. Each row
of information contains name, sex, age, stwgt, endwgt height for each person.
MISSING
VALUES
-- Represent missing or unavailable data values to the SAS system. Missing values are represented with periods
(for numeric) and blanks (for character) data when data is printed out.
ENTERING
DATA INTO THE COMPUTER
A computer program without data is of no
value. One of the first steps is to know
how to enter data in a form that the computer can read.
You may want to conduct a study
analyzing specific physical data on a series of people who are involved in a
health club. The first step is figure
out what information you will need. You
would collect it from the people who are in the study. You may do this by
having each member fill out a form with the information the health club wants
to analyze.
Next, someone would enter the data in a
form the computer can read. The SAS
system allows you to enter data using different methods.
One example is to enter data by putting
each data name in specified columns. This method is called the COLUMN INPUT
FORMAT. This is the most common
method.
In the following example, data is
entered in these columns:
NAME 1-18, SEX
20, AGE 22-23, STWGT 25-27, ENDWGT
29-31, HEIGHT 33-34, TEAM 36-41
---------1---------2---------3---------4------
Charlene Armstrong F 35 152 139 66
Yellow
David Shaw M 27 189 165 68 Red
Amelia Serrano F 50 145 124 65 Yellow
Ann Nance F 31 210 192 72 Red
Ashley McKnight F 26 127 118 62 Red
Jim Brown M 41 220 73 Yellow
Susan Stewart F 29 135 126 63 Red
Rose Collins F 37 155 141 67 Blue
Jason Schock M 28 187 172 77 Green
Kanoko Nagasaka F 46 135 122 66 Blue
Richard Rose M 33 181 166 72 Green
David Sims M 50 280 300 70 Blue
Elizabeth Sims F 48 300 200 65 Green
Tim Jones M 35 280 168 70 Blue
Larry Goss M 21 188 174 73 Green
Asha Garg M 56 148 132 61 Yellow
Jennifer Brooks F 42 208 165 72 Red
You can also enter data by separating
each value with a space. This method is
referred to as LIST INPUT FORMAT.
Charlene Armstrong F 35 152 139 66
Yellow
David Shaw M 27 189 165 68 Red
Amelia Serrano F 50 145 124 65 Yellow
Ann Nance F 31 210 192 72 Red
Ashley McKnight F 26 127 118 62 Red
Jim Brown M 41 220 . 73 Yellow
Susan Stewart F 29 135 126 63 Red
Rose Collins F 37 155 141 67 Blue
Jason Schock M 28 187 172 77 Green
Kanoko Nagasaka F 46 135 122 66 Blue
Richard Rose M 33 181 166 72 Green
David Sims M 50 280 300 70 Blue
Elizabeth Sims F 48 300 200 65 Green
Tim Jones M 35 280 168 70 Blue
Larry Goss M 21 188 174 73 Green
Asha Garg M 56 148 132 61 Yellow
Jennifer Brooks F 42 208 165 72 Red
You
will learn more about the differences between the two types of input later.
SELECTING TASKS
FOR PROGRAMS
Before you write a program, you need to
determine what tasks you want the SAS system to perform. For example, you may want to print out the
data set, you may want to produce a graph, or a plot, or add other information
to the table.
SAS
PROGRAMS
A SAS program is a group of
step-by-step instructions, also known as SAS statements that instruct
the computer to perform specific tasks.
PARTS
OF A SAS PROGRAM
SAS data
htwt;
statements input
name $ 1-18 sex $ 20 age 22-23 stwgt 25-27 endwgt 29-31
height 33-34 team $ 36-41;
datalines;
Charlene Armstrong F 35 152 139 66
Yellow
David Shaw M 27 189 165 68 Red
Data Amelia Serrano F 50 145 124 65 Yellow
lines Ann Nance F 31 210 192 72 Red
Ashley McKnight F 26 127 118 62 Red
Jim Brown M 41 220 73 Yellow
Susan Stewart F 29 135 126 63 Red
Rose Collins F 37 155 141 67 Blue
Jason Schock M 28 187 172 77 Green
Kanoko Nagasaka F 46 135 122 66 Blue
Richard Rose M 33 181 166 72 Green
David Sims M 50 280 300 70 Blue
Elizabeth Sims F 48 300 200 65 Green
Tim Jones M 35 280 168 70 Blue
Larry Goss M 21 188 174 73 Green
Asha Garg M 56 148 132 61 Yellow
Jennifer Brooks F 42 208 165 72 Red
;
run;
SAS
statements proc print
data=htwt;
run;
proc
plot data=htwt;
plot height*stwgt;
run;
SAS statements usually begin with a SAS
keyword that identifies the type of statement being used. Common SAS keywords are DATA, INPUT and
PROC. The remainder of the
statement contains additional information required for the system to perform
the task.
Note:
All SAS statements end with a semicolon
(;).
SAS statements also can begin in any column on
a line.
Individual statements can occupy one
line or can extend across several lines.
However it is easier to read and follow
the program when each statement starts on its own line. Examples are below:
data htwt;
input name $ 1-18 sex $ 20 age 22-23
stwgt 25-27 endwgt 29-31
height 33-34 team $ 36-41;
datalines;
or
data
htwt; input name $ 1-18 sex $ 20 age 22-23 stwgt 25-27 endwgt 29-31
height 33-34 team $ 36-41; datalines;
STATEMENTS
IN SAS PROGRAMS
LIBNAME libname cs2331lib ‘C:\CS2331’;
This statement is used to create a SAS
library of saved tables for use in future programs. Using a library allows data tables created
from past inputs to be used in new analyses and programs without reprocessing
the original data with data statements in the new program. An example would be:
libname
CS2331lib ‘C:\CS2331’;
data CS2331lib.htwt;
input name $ 1-18 sex $ 20 age 22-23
stwgt 25-27 endwgt 29-31
height 33-34 team $ 36-41;
The
above example would save a copy of the table htwt in the folder CS2331 on the C: drive of the PC. The table could be used in a future program
merely by including the libname
statement and a reference to the table such as:
libname
CS2331lib ‘C:\CS2331’;
proc
print data=CS2331lib.htwt;
DATA data htwt;
The first statement is a DATA
statement. A DATA statement instructs
the SAS System to read data and organize them into a SAS Table or data
set. A DATA statement consists of the
keyword DATA and a user-supplied data set name.
Usually this name should refer to some action that you are doing in
this data statement. In other words,
make it meaningful to help you and others who will have to look at the program
later.
The names of Data sets can be in
uppercase, lowercase or mixed case. They
are case insensitive. The name HTWT,
htwt, and HtWt are all the same to SAS.
INPUT input name $ 6-23 team $ 25-30 stwgt 32-34
endwgt 36-38 sex $ 40
age 42-43 height 45-46;
The second statement is an INPUT
statement. It provides the information
the SAS system requires to organize data into a SAS data set. The INPUT statement begins with the keyword
INPUT and contains a user-supplied list of column names, types, and if
necessary, column locations. In this
case, there are six column names.
Notice that NAME, SEX, and TEAM are
followed by a dollar sign ($). This
symbol indicates that NAME, SEX and TEAM are character names with values
containing alphabetic characters. The other column names are numeric.
NOTE:
Input names can be either lower case or upper
case. However, when the information is
printed out, the names above the columns are listed by how they were entered in
the input statement. Example: in the
input above, the output would look like the following:
Obs name team stwgt endwgt sex age
height
1 CHARLENE ARMSTRONG YELLOW
152 139 F
35 66
2 DAVID SHAW RED 189 165
M 27 68
3 AMELIA SERRANO YELLOW 145
124 F 50
65
4 ANN NANCE RED 210 192
F 31 72
However, if you
use uppercase on some of the column names in the input statement, they will be
printed out in capital letters on the output.
This is the input statement:
input NAME $ 6-23 team $ 25-30 STWGT 32-34 ENDWGT 36-38
sex $ 40
age 42-43 height
45-46;
This is an
example of how the output would look:
Obs
NAME
team STWGT ENDWGT
sex age height
1 CHARLENE ARMSTRONG YELLOW
152 139 F
35 66
2 DAVID SHAW RED 189 165
M 27 68
3 AMELIA SERRANO YELLOW 145
124 F 50
65
4 ANN
NANCE RED 210 192
F 31 72
DATALINES
The
DATALINES statement indicates that the data lines follow in the program. A single semicolon marks the end of the data
lines. There are other ways to insert
data in a program. They will be discussed later.
RUN
The RUN
statement instructs the system to execute the previous statements. Although the SAS system does not always
require a RUN statement after the datalines and semicolon, it is recommended
that you include a RUN statement in this section of your programs. When you use the PC versions of SAS, you need
to use the RUN statement.
PROC PRINT proc print; or
proc print data=htwt;
The PROC PRINT
statement instructs the SAS system to print data. PRINT is a procedure, a prewritten
computer program that analyzes and processes data.
A PROC statement
consists of the keyword PROC and the procedure name, such as PRINT. You can also supply a user-supplied statement
such as DATA =. The DATA = option
specifies the table name.
NOTE: SAS automatically reads the most recently
created SAS table. The DATA= option
enables you to override the system default and specify a data set of your
choice.
PROC PLOT proc plot
data=htwt; plot height*stwgt;
The PROC PLOT
statement requests a plot of the data.
The PLOT statement provides the details required to product the plot you
want. The column HEIGHT will be on the
vertical axis and the column STWGT will be on the horizontal axis.
SAS OUTPUT
You will receive
output from the program. The first of it
will be a SAS Log. This displays
the SAS statements you submitted and contains SAS system messages about the
execution of the program.
The PROC
PRINT statement produces a first page of output. It will automatically
display the number of observations within the SAS data set in the first column
of output. The column names are also
supplied by the program.
The PROC PLOT
will be on a separate page. It will
show the height-weight points for each row of input.
Subscribe to:
Comments (Atom)
