Marketing Analytics

Clinical Analytics

Web Analytics

Risk Analytic

something

Career fields for Analytics:

Marketing Analytics

Clinical Analytics

Web Analytics

Risk Analytic

Marketing Analytics

Clinical Analytics

Web Analytics

Risk Analytic

something

Posted by sasdataguru@gmail.com at
1:52 PM

We are pleased to announce that SAS Data Guru has started to offer Google Analytics Services to the public.

Services include:

1. Set up Google Analytics

2. Tutorial on Report Utlization

3. Search Engine Optimization

4. Channel Marketing Campaign

5. HTML Customization for Advanced Features

Click here to contact us.

Please contact us at sasdataguru@gmail.com for more details.

Services include:

1. Set up Google Analytics

2. Tutorial on Report Utlization

3. Search Engine Optimization

4. Channel Marketing Campaign

5. HTML Customization for Advanced Features

Click here to contact us.

Please contact us at sasdataguru@gmail.com for more details.

Posted by sasdataguru@gmail.com at
6:20 PM

1. Build Model Using R2 or Forwarding Method

2. Check Residual Plot

3. Check Multicollinearity

2. Check Residual Plot

3. Check Multicollinearity

Posted by sasdataguru@gmail.com at
2:24 PM

Dear readers, if you learn SAS because you want to make more money in your life, please pause and take a look at this book:

Busting Loose From the Money Game: Mind-Blowing Strategies for Changing the Rules of a Game You Can't Win by Robert Scheinfeld

Busting Loose From the Money Game: Mind-Blowing Strategies for Changing the Rules of a Game You Can't Win by Robert Scheinfeld

Posted by sasdataguru@gmail.com at
11:06 PM

"Longitudinal studies are defined as studies in which the outcome variable is repeatedly measured; i.e. the outcome variable is measured in the same individual on several different occasions. In longitudinal studies the observations of one individual over time are not independent of each other, and therefore it is necessary to apply special statistical techniques, which take into account the fact that the repeated observations of each individual are correlated. The definition of longitudinal studies(used in this book) implicates that statistical techniques like survival analyses are beyond the scope of this book. Those techniques basically are not longitudinal data analysis techniques because (in general) the outcome variable is an irreversible endpoint and therefore strictly speaking is only measured at one occasion. After the occurrence of an event no more observations are carried out on that particular subject."

Excerpt from Applied Longitudinal Data Analysis for Epidemiology, A Practical Guide by Jos W. R. Twisk

Excerpt from Applied Longitudinal Data Analysis for Epidemiology, A Practical Guide by Jos W. R. Twisk

Posted by sasdataguru@gmail.com at
11:32 PM

Interesting web sites/pages related to Language R:

Sentiment Mining Video

Text Mining Infrastructure in R

tm - Text Mining Package

Sentiment Mining Video

Text Mining Infrastructure in R

tm - Text Mining Package

Posted by sasdataguru@gmail.com at
3:31 PM

When it comes to test hypotheses, nothing is more important than p-value. If you have hard time to understand or memorize the use of p-value. Here is a simple way to remember it. **P-value is the probability that the null hypothesis is true. When this value is too small and below a pre-set threshold, we can confidently reject the null hypothesis. **This threshold is called the significant level. This value is set by the investigator and normally is 0.05 or 0.01.

Posted by sasdataguru@gmail.com at
6:32 PM

proc sql;

create table a1 as

select *

from a

where customer like '%Jane%'

; quit;

proc sql;

create table a2 as

select *

from a

where customer like '%Jane%'

and visit_dte between '01jan2001:00:00:00'dt and '31jan2001:00:00:00'

; quit;

Posted by sasdataguru@gmail.com at
11:18 AM

proc sql;

create table a1 as

select *

from retail

where store_name like '%MAC''Y%'

; quit;

Posted by sasdataguru@gmail.com at
11:16 AM

Table A: customer_id

Table B: customer_id, visit_date, purchase_amt

proc sql;

create table ab as

select a.customer_id, b.visit_date, b.purchase_amt

from a left join b on a.customer_id = b.customer_id

where b.visit_date between '01jan2011' and '31jan2011'

; quit;

Correct way:

-------------------------

data b1;

set b;

if visit_date >= '01jan2011'd and '31jan2011'd;

run;

proc sql;

create table ab as

select a.customer_id, b.visit_date, b.purchase_amt

from a left join b on a.customer_id = b.customer_id

; quit;

Table B: customer_id, visit_date, purchase_amt

proc sql;

create table ab as

select a.customer_id, b.visit_date, b.purchase_amt

from a left join b on a.customer_id = b.customer_id

where b.visit_date between '01jan2011' and '31jan2011'

; quit;

Correct way:

-------------------------

data b1;

set b;

if visit_date >= '01jan2011'd and '31jan2011'd;

run;

proc sql;

create table ab as

select a.customer_id, b.visit_date, b.purchase_amt

from a left join b on a.customer_id = b.customer_id

; quit;

Posted by sasdataguru@gmail.com at
11:13 AM

SAS Data Step offers last, first, lag, retain function/statement to allow users to process the data vertically. Many users tend to get stuck in this programming paradigm when using SAS to process the data. They normally ignore most time they can simplify the processing using horizontally. The key to use this paradigm is to use PROC TRANSPOSE to transform the data and then use a Data Step to process it. Here is an example.We have a list of customers visiting the stores and we want to know the highest purchase amount in last three visits.

proc sort data=customers;

by customer_id descending visit_date;

run;

data customers;

set customers;

by customer;

retain visit_cnt max_amount;

if first.customer then do;

max_amount = 0;

visit_cnt = 1;

end;

if visit_cnt <= 3 & amount > max_amount then max_amt = amount;

visit_cnt = visti_cnt + 1;

run;

proc sort data=customers;

by customer_id descending visit_date;

run;

proc transpose data=customer prefix=amt;

by customer;

var amount;

run;

data customers;

set customers;

max_amt = max(of amt1 - amt3);

run;

Here is another post which demonstrates the use of this programming paradigm.

proc sort data=customers;

by customer_id descending visit_date;

run;

data customers;

set customers;

by customer;

retain visit_cnt max_amount;

if first.customer then do;

max_amount = 0;

visit_cnt = 1;

end;

if visit_cnt <= 3 & amount > max_amount then max_amt = amount;

visit_cnt = visti_cnt + 1;

run;

proc sort data=customers;

by customer_id descending visit_date;

run;

proc transpose data=customer prefix=amt;

by customer;

var amount;

run;

data customers;

set customers;

max_amt = max(of amt1 - amt3);

run;

Here is another post which demonstrates the use of this programming paradigm.

Posted by sasdataguru@gmail.com at
11:20 PM

For each SAS dataset being created or modified by a SAS step, a .lck file will be created in work directory. This .lck file has the file name like .sas7bdat.lck. SAS will write the output for this dataset to this .lck file. Once the step is completed, the .lck file will be renamed as .sas7bdat. If the step progresses well, this .lck file should increase its file size continuously. If not, it indicated something wrong with this step. For example, if you run a PROC SQL to retrieve data from a database and it never finishes, then you can look at the .lck file. If its size stays at 1KB, then definitely there is something wrong with this PROC SQL step and worth investigating.

Posted by sasdataguru@gmail.com at
5:49 PM

Colon in SAS is very useful and handy in many ways. In this posts, several examples are offered to illustrate this.

1.

array avar(*) a1-a10 => array avar(*) a:

Posted by sasdataguru@gmail.com at
9:53 AM

These three terms are very confusing to many people.

Experiment

RCT

Observational Study:

Cohort: from exposure to outcome

Case-Control: from outcome to exposure

Experiment

RCT

Observational Study:

Cohort: from exposure to outcome

Case-Control: from outcome to exposure

Posted by sasdataguru@gmail.com at
12:15 AM

proc transpose;

run;

data a1;

array ff{*} fill:;

run;

run;

data a1;

array ff{*} fill:;

run;

Posted by sasdataguru@gmail.com at
9:02 PM

There are four special words used in SAS arrays:

_ALL_

_CHARACTER_

_NUMERIC_

_TEMPORARY_

Posted by sasdataguru@gmail.com at
2:32 PM

Posted by sasdataguru@gmail.com at
11:55 AM

Have you ever heard of data plumming? That is right. That is the terrible term for describing a SAS programmer only know

Posted by sasdataguru@gmail.com at
11:22 AM

Cohort is the term frequently used/seen in statistical analysis. Cohort in general can be translated as Group.

Posted by sasdataguru@gmail.com at
11:20 AM

data sales;

retain mu 300 std 1000 seed 0;

format sls_dt mmddyy10.;

do i=1 to 100;

sls_dt = '01jan2011'd + i - 1;

sls_amt = 50*i + mu + std*rannor(seed);

if sls_amt < 0 then sls_amt = 0;

output;

end;

run;

proc gplot data = sales;

plot sls_amt * sls_dt;

run; quit;

proc reg data=sales outest=est;

model sls_amt = sls_dt;

run; quit;

data _null_;

set est;

prediction = intercept + sls_dt * '31dec2011'd;

put prediction;

run;

retain mu 300 std 1000 seed 0;

format sls_dt mmddyy10.;

do i=1 to 100;

sls_dt = '01jan2011'd + i - 1;

sls_amt = 50*i + mu + std*rannor(seed);

if sls_amt < 0 then sls_amt = 0;

output;

end;

run;

proc gplot data = sales;

plot sls_amt * sls_dt;

run; quit;

proc reg data=sales outest=est;

model sls_amt = sls_dt;

run; quit;

data _null_;

set est;

prediction = intercept + sls_dt * '31dec2011'd;

put prediction;

run;

Posted by sasdataguru@gmail.com at
5:27 PM

An Introduction to Experimental Design Using SAS

Application of Experimental Design in Consumer Direct-Mail

Solicitations

Introduction to Design and Analysis of Experiments with the SAS System by Asheber Abebe

Application of Experimental Design in Consumer Direct-Mail

Solicitations

Introduction to Design and Analysis of Experiments with the SAS System by Asheber Abebe

Posted by sasdataguru@gmail.com at
1:59 PM

`SAS offers a function called rannor which allows you to generate a sample from a normal distribution easily.`

```
```

```
data temp(keep=x);
retain mu 50 std 20 seed 0;
do i=1 to 1000;
x = mu + std*rannor(seed);
output;
end;
run;
```

```
```

`proc chart data=temp;`

```
vbar x;
run;
```

Posted by sasdataguru@gmail.com at
1:47 PM

Survival data consist of a response (event time, failure time, or survival time) variable that measures the duration of time until a specified event occurs and possibly a set of independent variables thought to be associated with the failure time variable. These independent variables (concomitant variables, covariates, or prognostic factors) can be either discrete, such as sex or race, or continuous, such as age or temperature. The system that gives rise to the event of interest can be biological, as for most medical data, or physical, as for engineering data. The purpose of survival analysis is to model the underlying distribution of the failure time variable and to assess the dependence of the failure time variable on the independent variables.

The following data is from Prentice, R.L. "Exponential survivals with censoring and explanatory variables.", Biometrika 60, 1973, 279-288.

The LIFETEST procedure computes nonparametric estimates of the survival distribution function. You can request either the product-limit (Kaplan and Meier) or the life-table (actuarial) estimate of the distribution. **PROC LIFETEST computes nonparametric tests to compare the survival(Kaplan-Meier) curves of two or more groups. No covariates involved. If covariates are involved, use Cox proportional hazards model.**

H0: S1(t) = S2(t)

HA: S1(t) ^= S2(t)

HA: S1(t) ^= S2(t)

PROC LIEFTEST PLOTS=(S) LINEPRINTER DATA=DSV;

TIME WKS*CENS(1);

STRATA VAC;

run;

proc phreg data=hsv

model wks*cens(1) = trt /ties=exact;

run;

proc phreg data=hsv

model wks*cens(1) = trt /ties=exact;

run;

Posted by sasdataguru@gmail.com at
10:35 PM

UCLA WhatStat offers very good definitions of variable types used in statistical analysis. I expand on that and summarize as below:

Categorical variable(called nominal variable): has two or more categories, but there is no intrinsic ordering to the categories.

Ordinal variable: is similar to a categorical variable. The difference between the two is that there is a clear ordering of the variables.

Interval variable: is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced.

Dummy variable(indicator variable): A categorical variable that has been dummy coded. Dummy coding (also called indicator coding) is usually used in regression models, but not ANOVA. A dummy variable can have only two values: 0 and 1. When a categorical variable has more than two values, it is recoded into multiple dummy variables.

Nominal variable:

Categorical variable(called nominal variable): has two or more categories, but there is no intrinsic ordering to the categories.

Ordinal variable: is similar to a categorical variable. The difference between the two is that there is a clear ordering of the variables.

Interval variable: is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced.

Dummy variable(indicator variable): A categorical variable that has been dummy coded. Dummy coding (also called indicator coding) is usually used in regression models, but not ANOVA. A dummy variable can have only two values: 0 and 1. When a categorical variable has more than two values, it is recoded into multiple dummy variables.

Nominal variable:

Posted by sasdataguru@gmail.com at
9:27 PM

What statistical analysis should I use? (summary)

Probability and Statistics

What is a p-value?

SAS Cheat Sheet

Choice of Statistical Methods

Choosing the Correct Statistical Test

Intuitive Biostatistics: Choosing a statistical test

If you want your own cheat sheet to be shared with readers or you know any good ones, please send them to me.

Probability and Statistics

What is a p-value?

SAS Cheat Sheet

Choice of Statistical Methods

Choosing the Correct Statistical Test

Intuitive Biostatistics: Choosing a statistical test

If you want your own cheat sheet to be shared with readers or you know any good ones, please send them to me.

Posted by sasdataguru@gmail.com at
12:39 AM

What is a p-value anyway?

Medical Statistics from Scratch

Medical Statistics Made Easy

Medical and Health Science Statistics Made Easy

Your Statistical Consultant: Answers to Your Data Analysis Questions

Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking

Common Errors in Statistics

Statistical Rules of Thumb

Medical Statistics from Scratch

Medical Statistics Made Easy

Medical and Health Science Statistics Made Easy

Your Statistical Consultant: Answers to Your Data Analysis Questions

Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking

Common Errors in Statistics

Statistical Rules of Thumb

Posted by sasdataguru@gmail.com at
11:44 PM

When it comes to statistical analysis, without doubt, SAS is the best choice and the tool of choice. I really can't think of any tools which can offer the same level of flexibility for analyzing the data. Below is an excerpt from a book which echos the same opinion:

While consulting for dozens of companies over 25 years of statistical application to clinical investigation, I have never seen a successful clinical program that did not use SAS.

--Preface,Common Statistical Methods for Clinical Research with SAS Examples, Glenn A. Walker

Posted by sasdataguru@gmail.com at
7:03 PM

Subscribe to:
Posts (Atom)