How to Detect Special Characters and Remove them in SAS Language

How to Detect Special Characters and Remove them in SAS Language

By : -

How to Detect Special Characters and Remove them in SAS Language

Where are Special Characters Coming From? Here are a Few Examples.

You may like the return carriage in an Excel® cell to improve the readability of your text. You know this Alt+Enter key combination (in Windows). You may also like tabs, bullets and other formating functionalities. Each time, what you’re actually using is called non-printable special characters.

Microsoft office® and IOS® can also create special characters for you. Type in a single quote and by default it will be converted into another kind of apostrophe ; a special character one. To go back to a non special character, you have to use Ctrl+Z (sous Windows®) ou cmd+Z (sous IOS).

Therefore, when importing files, we can fairly easily import special characters.

There are 256 ASCII (American Standard Code for Information Interchange) characters. Among them, there are the letters of the alphabet, the digits, the letters specific to some languages like é in French, and this famous non-printable special characters which are not displayed with a proc print.

Each special character uses 7 bits (storage space).

To add characters specific to a langage the Unicode™ standard was developed.

2. What are those 34 non-printable characters?

ASCII characters are numbered:

  • from 0 to 255 (decimal values)
  • from 00 to FF (hexadecimal values)

The first 33 characters (0 to 32) and character 127 are the non-printable characters.

DEC HEX Description
  0  00 NUL Null
  1  01     STX Start of Header
  2  02     SOT Start of Text
  3  03     ETX End of Text
  4  04     EOT End of Transmission
  5  05     ENQ Enquiry
  6  06     ACK Acknowledge
  7  07     BEL Bell
  8  08      BS BackSpace
  9  09      HT Horizontal Tabulation
 10  0A      LF Line Feed
 11  0B      VT Vertical Tabulation
 12  0C      FF Form Feed
 13  0D      CR Carriage Return
 14  0E      SO Shift Out
 15  0F      SI Shift In
 16  10     DLE Data Link Escape
 17  11     DC1 Device Control 1 (XON)
 18  12     DC2 Device Control 2
 19  13     DC3 Device Control 3 (XOFF)
 20  14     DC4 Device Control 4
 21  15     NAK Negative acknowledge
 22  16     SYN Synchronous Idle
 23  17     ETB End of Transmission Block
 24  18     CAN Cancel/Annuler
 25  19      EM End of Medium
 26  1A     SUB Substitute/Substituer
 27  1B     ESC Escape/Echappe
 28  1C      FS File Separator
 29  1D      GS Group Separator
 30  1E      RS Record Separator
 31  1F      US Unit Separator
 32  20 [Space] Space
 33  21         !
 34  22         "
 35  23         #
 36  24         $
 37  25         %
 38  26         &
 39  27         '
 40  28         (
 41  29         )
 42  2A         *
 43  2B         +
 44  2C         ´
 45  2D         -
 46  2E         -
 47  2F         /
 48  30         0
 49  31         1
 50  32         2
 51  33         3
 52  34         4
 53  35         5
 54  36         6
 55  37         7
 56  38         8
 57  39         9
 58  3A         :
 59  3B         ;
 60  3C         <
 61  3D         =
 62  3E         >
 63  3F         ?
 64  40         @
 65  41         A
 66  42         B
 67  43         C
 68  44         D
 69  45         E
 70  46         F
 71  47         G
 72  48         H
 73  49         I
 74  4A         J
 75  4B         K
 76  4C         L
 77  4D         M
 78  4E         N
 79  4F         O
 80  50         P
 81  51         Q
 82  52         R
 83  53         S
 84  54         T
 85  55         U
 86  56         V
 87  57         W
 88  58         X
 89  59         Y
 90  5A         Z
 91  5B         [
 92  5C         \
 93  5D         ]
 94  5E         ^
 95  5F         _
 96  60         `
 97  61         a
 98  62         b
 99  63         c
100  64         d
101  65         e
102  66         f
103  67         g
104  68         h
105  69         i
106  6A         j
107  6B         k
108  6C         l
109  6D         m
110  6E         n
111  6F         o
112  70         p
113  71         q
114  72         r
115  73         s
116  74         t
117  75         u
118  76         v
119  77         w
120  78         x
121  79         y
122  7A         z
123  7B         {
124  7C         |
125  7D         }
126  7E         ~
127  7F     DEL Delete/Supprimer

Source: ascii-table.com

3. How to create a value using hexadecimal in SAS?

Let’s see how to create a hexadecimal character.

The hexadecimal value is put into quotes and followed by the x suffix.

data example;
   length f1 $20;
   f1 = cat('Part1','0A'x,'Part2');
run;

4. How to Identify Special Characters using hex Format

The program below displays each letter of the f1 variable in a hexadecimal form in the log. It’s a quick solution to identify a special character hidden in a string.

Each pair of letter/digit represents one character. Here 0A is the hexadecimal value for ellipsis.

data _null_;
   set example;
   put f1 hex.;
run;

You can also create a new variable containing the string in hexadecimal form.

The new variable will have twice more characters. Therefore, it’s length will have to be twice as long as the original variable.

data example;
   length f1 $20 f4 $40;
   f1 = cat('Part1','0A'x,'Part2');
   f4 = put(f1,hex.);
run;

5. Which function(s) can be used to remove special characters?

The translate() and tranwrd() functions can convert special characters in blanks. The combination of transtrn() et trimn() functions can be used to remove these unnecessary blanks.

The translate() function is made of 3 characters: the original string, the new characters and the characters to be replaced.

Here is an example: the first statement create a variable called f1. This variable had a special non-printable character in the middle of the string. The first two instructions create f2_translate et f3_tranwrd variables.

data example;
   length f1 f2_translate f3_tranwrd f4_transtr $20;
   f1           = cat('Part1','0A'x,'Part2');
   f2_translate = translate(f1,' ','0A'x);
   f3_tranwrd   = tranwrd(f1,'0A'x,' ');
   f4_transtrn  = transtrn(f1,'0A'x,trimn(''));
run;

The tranwrd() function also has 3 parameters: the original string, the group of letters to be replaced and the new text. Be careful. This is the reverse order of the translate() function.

The transtrn() function works the same way as the tranwrd() function be allow a length of zero as third parameter.

Replace more than one character at a time: you can list as many hexadecimal characters as you want in the third parameter of the  translate() function.

In this program, the first 10th characters ASCII with values 00, 01, 02,…09 and 0A, if any,  are replaced by a blank.

data example;
   set example;
   length f2_translate $20;
   f2_translate=translate(f1,' ','000102030405060708090A'x);
run;

Leave a Reply

Your email address will not be published.

one × 1 =