The Two Faces of the substr Function in SAS

The Two Faces of the substr Function in SAS

By : -

The Two Faces of the substr Function in SAS

When the substr function is put on the right-hand side of the equation, it retains a given number of characters starting from a given position.

When the substr function is put on the left-hand side of the equation, it replaces a string subset by another one. With the substr function the starting position of the string must be known whereas it is not needed with the translate.

1. Syntax

The function is composed of 3 parameters:

  • the original string is either is given in quotes or using a variable
  • the position of the first character of the string subset
  • the total number of characters in the string subset (optional).

If the total number of character in the string subset is not given in the third parameter, then all the characters after the start position of the string subset are considered.

2. Example with the substr function on the right-hand side of the equation

In the code below have have a variable called testcase which has three values: tc0101 and tc0102 belong to the same group, whereas tc0201 belongs to a different group. We create a variable called grp. This variable has for value 01 for the first two observations and 02 for the third value. In other words, the values of the grp variable are equivalent to the text given in third and fourth characters of the testcase variable.

data example;
   length testcase $6;
   testcase='tc0101';
   output;
   testcase='tc0102';
   output;
   testcase='tc0201';
   output;
run;
data example;
   set example;
   length grp $2;
   grp=substr(testcase,3,2);
run;

If the number of caractères to retain is not given in the third parameter, it is the whole string from the starting position of the subset which is kept: 01010102 and 0201.

grp=substr(testcase,3);

2. Example with the substr function on the left-hand side of the equation

Let’s use the same example once more.

data example;
   length testcase $6;
   testcase='tc0101';
   output;
   testcase='tc0102';
   output;
   testcase='tc0201';
   output;
run;

The string 'tc' starts from the first position of testcase and it is 2-character long.

We are not replacing 'tc' with the value 'AB'.

data example;
   set example;
   substr(testcase,1,2)='AB';
run;

Please note that the value of the new string has the same length as the original one. If the new string is longer, additional characters would be ignored. If the new string is shorted, blanks will replaced the missing characters.

Leave a Reply

Your email address will not be published.

eighteen − eleven =