Going from Character to Numeric and Reversly

Going from Character to Numeric and Reversly

By : -

Going from Character to Numeric and Reversly

atasets contains both character and numeric variables. We can have a character variable which contains only numbers. We may want to get a numeric variable out of it. Reversely, we can have a numeric variable and decide to store its content in a character variable. It is a very common conversion. For that we use the input and put functions.

1. From character to numeric (and the other way round)

The input function needs two things: the text (also called a character string) and the rule to read/interpret the values correctly. Therefore, the function has two parameters. The first parameter contains either the name of the character variable or directly the character string into quotes. The second parameter contains the name of the informat.

In the example below, we say that the original variable (x_char) should be read like text variable having up to 11 characters.

data char_to_num;
   x_char = '12345.12345';
   x_num  = input(x_char,$11.);
  *x_num  = input('12345.123456';,$11.)
  *x_num  = 12345.12345;
run;

2. From numeric to character

Now, we are gonna work with a numeric variable which has 11 digits/delimiter/minus signe including 5 for the decimal part (on the right of the delimiter).

We create a character variable with lenght 11 using the put function.

This put function has two parameters: the name of the numeric variable (or a numeric value) and the format.

In the example, the format is 11.5. It means that the value in the created character variable can have up to 11 characters including 5 for the decimal part.

If the numeric value contains more than 11 components, the system searches for the worth scenario. It will first reduce the number of digits. If it is not enough then it reduces the number digits for the integer part.

An alternative to 11.5 is in this case best11. format. If no number is specified in the best. format, it means best8. i.e. the system searches for the best possible answer within 8 digits/delimiter.

data num_to_char;
   y_num = 123.123;
   output;
   y_num = 12345.12345;
   output;
   y_num = .;
   output;
run;

data num_to_char;
   set num_to_char;
   length y_char1 y_char2 $11;
   y_char1 = put(y_num,11.6);
  *y_char1 = put(y_num,best11.);
  *y_char1 = put(12345.12345,11.6);
  *y_char1 = '12345.12345';
   
   y_char2 = put(y_num,best11.);

   if not missing(y_num) then y_char3=strip(put(y_num,11.6));
run;

Have a look at the slight difference between 11.6 and best11.. In both cases, blanks are added at the front of the string. So, think about using the strip function to remove them.

Furthermore, the dot of missing numeric variables is converted into a dot in the character and not a blank. To avoid the issue, you can apply the put function using conditions (if for example).

Don’t forget to define the length of any new character variable to avoid having unexpected truncated values.

How do you remember that the input function uses an informat? Well both input and informat words start with in. The put function does not start with in. Therefore it doesn’t use informat but format.

Bonus : how to add zeros at both ends of the text generated with the put function? You can use the z. format to replace spaces with zeros.

In the example below, the new variable will be 8 characters long. The last two digits will be for the decimal part. As there is only a single decimal places, a zero will be added at the end. It remains two blanks at the front which will be replaced by zeros.

data num_char_zero;
    z_num = 123.1;
    length z_char $8;
    z_char = put(z_num,z8.2);
   *z_char = '00123.10';
run;

3. Going from a character variable to another character variable using the put function.

The original character value can also be a string and not a number. In this case, the new variable takes the value of a character format frequently defined by the programmer.

proc format;
   value $cntry
   FR = 'France'
   LU = 'Luxembourg'
   CH = 'Suisse';
run;

data char_to_char;
   a1_char='LU';
   a2_char=put(a1_char,$cntry.);
   *a2_char='Luxembourg';
 run;

4. The Question

How do you deal with variables where all the values don’t match the same informat?

data example;
   length x $10;
   x='21JAN2015'; output;
   x='JAN2015';   output;
   x='January';   output;
   x='20150121';  output;
run;

In this example, data of the x variable are displayed differently. Some of them are full dates (day, month and year) and some are not. If I assume that the only valid date is of the form DDMMMYYYY then I can use the following code to create the SAS date:

data example_error;*(where=(input(x,date9.));
   set example;
   x_num=input(x,date9.);
run;

However, a NOTE is displayed in the log. The NOTE contains _ERROR_=1. If the input function in used in the where option of the SAS dataset then an ERROR occurs.

NOTE: Invalid argument to function INPUT at line ... column ... 
... _ERROR_=1 _N_=...
ERROR: INPUT function reported 'ERROR: Invalid ... value' while processing WHERE clause.

To avoid this, you simply need to add two question mark at the front of the informat name.

data example_noerror;*(where=(input(x,??date9.));
   set example;
   x_num=input(x,??date9.);
run;

The double question mark at the format of the informat name stops having the NOTE in the log. Of course, this is only of interest if you have good reasons to ignore this message. Otherwise, don’t use than and check your data.

Now, to expend the septum of valid values, you can use other kind of informats like anydtdte or create your own one.

Avec the anydtdte informat, it isn’t just 21JAN2015 which is converted into SAS date but also JAN2015 and 20150121. With respect to JAN2015, the stored SAS date refers to the 1st of January 2015. With anydtdte, no _ERROR_ appears int he log.

data exemple_sanserreur;
   set exemple;
   x_num=input(x,anydate.);
   format x_num date9.;
run;

5. Expert Corner: putn and putc functions

Let’s look into a scenarios used rarely but which still remains very powerful.

Here are the data:

data one;
   length test $3 fmt $6 result $10;
   test='AAA'; measurement=0;  fmt='NY';       result='No';     output;
   test='AAA'; measurement =1;  fmt='NY';       result='Yes';    output;
   test='BBB'; measurement =-1; fmt='LESSMORE'; result='Less';   output;
   test='BBB'; measurement =0;  fmt='LESSMORE'; result='Normal'; output;
   test='BBB'; measurement =1;  fmt='LESSMORE'; result='More';   output;
run;

The objective is to get the same thing has what is currently available in the result variable.

For that, we need two numeric formats: NY and LESSOR referenced in the column fmt fo the dataset one.

proc format;
   value ny        0='No'
                   1='Yes';
   value lessor   -1='Less'
                   0='Normal'
                   1='More';
run;

Now, we use the putn fonction to create the result2 variable. Instead of writing the format name in the function, we are gonna put the name of the variable which contains those format names.

data one;
   set one;
   length result22 $10;
   result2=putn(measurement,fmt);
run;

The same principle applied to the putc function. The only difference is that the formats are character formats ($).

data two;
   length test $3 measurement $1 fmt $3 result $3;
   test='CC'; measurement='N'; fmt='$NY'; result='No'; output;
run;

proc format;
   value $ ny 'N'='No'
              'Y'='Yes';
run;

data two;
   set two;
   length result2 $10;
   result2=putc(measurement,fmt);
run;

6. Comment

In a dataset, we cannot have two variables with the same name. It is possible to correct an exiting variable (e.g. divide by ten a numeric variable, keep a substring of a character variable), but it is not possible to change the variable type or the length of the variable once those attributes are defined.

To get around this problem, you can rename the original variable in order to use the name for the new variable and delete the original variable at the end.

In a sql procedure, the same name can be used.

data one;
   x=1;
run;

data one (drop=x);
   set one (rename=(x=_x));
   length x $3;
   x='ABC';
run;

Leave a Reply

Your email address will not be published.

12 − 2 =