Where not sas?
Dare to experiment - error messages are free!
Editor's note: consolidating some other helpful comments in this solution...
@bhthomas99 asks about the NOTIN operator (no space). @abbess replied:
And @Doc_Duke points out that order-of-operations matters:
You can use a WHERE statement in SAS to subset the data if you are reading data from a SAS dataset.
Example:
You may use a WHERE or a subsetting IF statement in this example. There are advantages to using a WHERE statement in SAS instead of a subsetting IF statement.
You have a larger alternative of operators that can be used with a WHERE statement, and if the input data set is indexed, the WHERE statement in SAS is likely more efficient.
You may also use a WHERE statement in a SAS procedure to subset the data being processed.
Below is the list of operators that you can use with the WHERE statement in SAS.
The IS MISSING and IS NULL operators are used with WHERE, ON, and HAVING expressions and can handle character or numeric variables.
It results in true if the expression results are missing and false if it is not missing.
They also work with the NOT operator.
The BETWEEN AND expression matches all the values greater than or equal to the first value and less than or equal to the second value. This works with characters in addition to numeric variables.
The NOT operator can be used with the WHERE statement in many other ways:
Let’s say you want to have like below and the where statements below are self-explanatory,
Multiple WHERE statements in SAS cannot be used in a DATA step like the IF statements. When SAS encounters a second WHERE statement in a DATA step, the first is replaced by the second.
The WHERE AND also known as WHERE ALSO will let you use multiple where statements in SAS to add more restrictions.
The IN operator is a comparison operator that searches for character and numeric values equal to one of the values from a given list of values.
The list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank.
For example, suppose you want to filter for SUVs, sedans or Wagon cars. You could specify the values as:
Note the use of strip and Upcase function in the type variable. This ensures that any leading or trailing spaces are removed, and the values will be converted to Uppercase before comparison.
Also, read the article on The Ultimate Guide To SAS Character Functions
In addition to this, you can use the logical operator NOT to exclude values that are on the list.
You can also use a shorthand notation to specify a range of numbers to search. The range is specified by using the syntax M: N as a value in the list to search, where M is the lower bound and N is the upper bound.
You can replace the below statement in the form of WHERE IN shorthand notation.
You can also read our article on How To Specify List Of Variables In SAS and How To Use The IN Operator In SAS Macro?
The CONTAINS expression matches any character value containing the given string.
The “contains” operator (?) and the “not contains” operator (^?) match a substring that appears anywhere in the target character variable.
LIKE operator is frequently used for pattern matching, that is, evaluating whether a variable value equals, begins with a specified character, or sounds like a specified value or pattern.
The LIKE expression uses two wildcard operators. When using the LIKE operator, the underscore(_) wildcard takes the place of a single character, whereas the % sign might be substituted for a string of any length (including a null string).
Double quotes (“) are used here to include the apostrophe in “Women’s Dress”. Single quotes can be used when the text string does not contain an apostrophe or other single quotes.
The percentage sign (%) wildcard searches for character values that begin with, end with or even contain certain character strings. Below are some of the examples using wildcards.
The above expression returns all names that begin with “a” and are followed by any characters of any lengths
_ indicates that any character can contain at the beginning of the string followed by “a” as the second character and ending with any character of any length.
Two underscores at the beginning indicate that the search should filter for any two characters at the beginning followed by “n”.
The above expression would return character values with two characters at the beginning followed by n and any one character at the end.
The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for identifying words that sound alike. The SOUNDEX algorithm is English-biased, so it’s not useful for languages other than English.
You can download the dataset employees.sas7bdat used in the program.
Also, read 4 Functions for fuzzy matching in SAS.
Using the WHERE statement in SAS and a WHERE= dataset option in a DATA step is not a good practice. SAS ignores the WHERE statement when both are used in the same dataset.
Additionally, a WARNING appears in the SAS log indicating that the WHERE statement cannot be applied.