Tuesday, August 16, 2011

PROC SQL Like Statement Case Problem


proc sql;
  create table a1 as
  select *
 from a
where customer like '%Jane%'
; quit;

proc sql;
  create table a2 as
select *
from a
where customer like '%Jane%'
and visit_dte between '01jan2001:00:00:00'dt and '31jan2001:00:00:00'
; quit;

A String with Quote - Like Statement




proc sql;
  create table a1 as
  select *
  from retail
  where store_name like '%MAC''Y%'
; quit;

SQL Query Trick

Table A: customer_id
Table B: customer_id, visit_date, purchase_amt

proc sql;
  create table ab as
  select a.customer_id, b.visit_date, b.purchase_amt
  from a left join b on a.customer_id = b.customer_id
  where b.visit_date between '01jan2011' and '31jan2011'
; quit;

Correct way:
-------------------------
data b1;
  set b;
  if visit_date >= '01jan2011'd and '31jan2011'd;
run;


proc sql;
  create table ab as
  select a.customer_id, b.visit_date, b.purchase_amt
  from a left join b on a.customer_id = b.customer_id
; quit;




Saturday, August 13, 2011

The Magic of Proc Transpose

SAS Data Step offers last, first, lag, retain function/statement to allow users to process the data vertically. Many users tend to get stuck in this programming paradigm when using SAS to process the data. They normally ignore most time they can simplify the processing using horizontally. The key to use this paradigm is to use PROC TRANSPOSE to transform the data and then use a Data Step to process it. Here is an example.We have a list of customers visiting the stores and we want to know the highest purchase amount in last three visits.

proc sort data=customers;
by customer_id descending visit_date;
run;

data customers;
  set customers;
by customer;
retain visit_cnt max_amount;
if first.customer then do;
  max_amount = 0;
  visit_cnt = 1;
end;
if visit_cnt <= 3 & amount > max_amount then max_amt = amount;
visit_cnt = visti_cnt + 1;
run;

proc sort data=customers;
by customer_id descending visit_date;
run;

proc transpose data=customer prefix=amt;
by customer;
var amount;
run;

data customers;
 set customers;
 max_amt = max(of amt1 - amt3);
run;

Here is another post which demonstrates the use of this programming paradigm.

How to Know a SAS Step Is Not Progressing

For each SAS dataset being created or modified by a SAS step, a .lck file will be created in work directory. This .lck file has the file name like .sas7bdat.lck. SAS will write the output for this dataset to this .lck file. Once the step is completed, the .lck file will be renamed as .sas7bdat. If the step progresses well, this .lck file should increase its file size continuously. If not, it indicated something wrong with this step. For example, if you run a PROC SQL to retrieve data from a database and it never finishes, then you can look at the .lck file. If its size stays at 1KB, then definitely there is something wrong with this PROC SQL step and worth investigating.

Thursday, August 11, 2011

The Magic of Colon (:) in SAS


Colon in SAS is very useful and handy in many ways. In this posts, several examples are offered to illustrate this.

1.

array avar(*) a1-a10 => array avar(*) a:

Wednesday, August 10, 2011

Cohort Study, Case-Control Study, RCT

These three terms are very confusing to many people.

Experiment
RCT


Observational Study:
Cohort: from exposure to outcome
Case-Control: from outcome to exposure

Tuesday, August 9, 2011

Use Geodist Function to Compute the Distance between Two Latitude and Longitude Coordinates

Medication Persistence Ratio Example

proc transpose;
run;

data a1;
array ff{*} fill:;
run;

Four Special Words Used in SAS Arrays

There are four special words used in SAS arrays: 

_ALL_
_CHARACTER_
_NUMERIC_
_TEMPORARY_


Definition of Baseline

1. Information gathered at the beginning of a study from which variations found in the study are measured.
2. A known value or quantity with which an unknown is compared when measured or assessed.
3. The initial time point in a clinical trial, just before a participant starts to receive the experimental treatment which is being tested. At this reference point, measurable values such as CD4 count are recorded. Safety and efficacy of a drug are often determined by monitoring changes from the baseline values.

Actuarial Pricing Example

Propensity Score Matching

SAS Programmer Career

Have you ever heard of data plumming? That is right. That is the terrible term for describing a SAS programmer only know

What does cohort mean?

Cohort is the term frequently used/seen in statistical analysis. Cohort in general can be translated as Group.