Monday, April 12, 2010

Finding The Closest String

When you type a wrong keyword in Google, Google will return the result along with the closet string to your keyword. You may wonder if you can do the same thing in SAS. Fortunately, SAS offers a handy function called SPEDIS which can allow you to perform the closest string search. It gives a value measuring the SPElling DIStance between two strings. The lower the value is, the closer two strings are, whereas 0 means two strings are identical. Here is an example. Suppose you want to find LIPITER in your drug database and can't find it, then you realize you may have spelled a wrong brand name and wonder what is the correct one. Below is the SAS code which will return the closest brand name:

data drug_db;
input brand_name $;
cards;
LIPEX
LIPSORB
LIPSTART
LIPITOR
LIPKOTE
LIPMAGIK
LIPMAX
; run;

proc sql outobs = 1;
select brand_name
from drug_db
order by spedis(brand_name,'LIPITER')
; quit;