SAS Data Guru: May 2011

Saturday, May 21, 2011

SAS DOE Articles

An Introduction to Experimental Design Using SAS

Application of Experimental Design in Consumer Direct-Mail
Solicitations

Introduction to Design and Analysis of Experiments with the SAS System by Asheber Abebe

Simulate a Normal Distribution

SAS offers a function called rannor which allows you to generate a sample from a normal distribution easily.

data temp(keep=x);
  retain mu 50 std 20 seed 0;
  do i=1 to 1000;
    x = mu + std*rannor(seed);
    output;
  end;
run;

proc chart data=temp;
  vbar x;
run;

Tuesday, May 17, 2011

Survival Analysis Example Using LIFETEST

Survival data consist of a response (event time, failure time, or survival time) variable that measures the duration of time until a specified event occurs and possibly a set of independent variables thought to be associated with the failure time variable. These independent variables (concomitant variables, covariates, or prognostic factors) can be either discrete, such as sex or race, or continuous, such as age or temperature. The system that gives rise to the event of interest can be biological, as for most medical data, or physical, as for engineering data. The purpose of survival analysis is to model the underlying distribution of the failure time variable and to assess the dependence of the failure time variable on the independent variables.

The following data is from Prentice, R.L. "Exponential survivals with censoring and explanatory variables.", Biometrika 60, 1973, 279-288.

The LIFETEST procedure computes nonparametric estimates of the survival distribution function. You can request either the product-limit (Kaplan and Meier) or the life-table (actuarial) estimate of the distribution. PROC LIFETEST computes nonparametric tests to compare the survival(Kaplan-Meier) curves of two or more groups. No covariates involved. If covariates are involved, use Cox proportional hazards model.

H0: S1(t) = S2(t)
HA: S1(t) ^= S2(t)

PROC LIEFTEST PLOTS=(S) LINEPRINTER DATA=DSV;

TIME WKS*CENS(1);

STRATA VAC;

run;

proc phreg data=hsv
model wks*cens(1) = trt /ties=exact;
run;

Variable Type Definition

UCLA WhatStat offers very good definitions of variable types used in statistical analysis. I expand on that and summarize as below:

Categorical variable(called nominal variable): has two or more categories, but there is no intrinsic ordering to the categories.

Ordinal variable: is similar to a categorical variable. The difference between the two is that there is a clear ordering of the variables.

Interval variable: is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced.

Dummy variable(indicator variable): A categorical variable that has been dummy coded. Dummy coding (also called indicator coding) is usually used in regression models, but not ANOVA. A dummy variable can have only two values: 0 and 1. When a categorical variable has more than two values, it is recoded into multiple dummy variables.

Nominal variable: