Going from Character to Numeric and Reversly
By : Admin_programmer -
atasets contains both character and numeric variables. We can have a character variable which contains only numbers. We may want to get a numeric variable out of it. Reversely, we can have a numeric variable and decide to store its content in a character variable. It is a very common conversion. For that we use the
1. From character to numeric (and the other way round)
input function needs two things: the text (also called a character string) and the rule to read/interpret the values correctly. Therefore, the function has two parameters. The first parameter contains either the name of the character variable or directly the character string into quotes. The second parameter contains the name of the informat.
In the example below, we say that the original variable (
x_char) should be read like text variable having up to 11 characters.
data char_to_num; x_char = '12345.12345'; x_num = input(x_char,$11.); *x_num = input('12345.123456';,$11.) *x_num = 12345.12345; run;
2. From numeric to character
Now, we are gonna work with a numeric variable which has 11 digits/delimiter/minus signe including 5 for the decimal part (on the right of the delimiter).
We create a character variable with lenght 11 using the
put function has two parameters: the name of the numeric variable (or a numeric value) and the format.
In the example, the format is
11.5. It means that the value in the created character variable can have up to 11 characters including 5 for the decimal part.
If the numeric value contains more than 11 components, the system searches for the worth scenario. It will first reduce the number of digits. If it is not enough then it reduces the number digits for the integer part.
An alternative to
11.5 is in this case
best11. format. If no number is specified in the
best. format, it means
best8. i.e. the system searches for the best possible answer within 8 digits/delimiter.
data num_to_char; y_num = 123.123; output; y_num = 12345.12345; output; y_num = .; output; run; data num_to_char; set num_to_char; length y_char1 y_char2 $11; y_char1 = put(y_num,11.6); *y_char1 = put(y_num,best11.); *y_char1 = put(12345.12345,11.6); *y_char1 = '12345.12345'; y_char2 = put(y_num,best11.); if not missing(y_num) then y_char3=strip(put(y_num,11.6)); run;
Have a look at the slight difference between
best11.. In both cases, blanks are added at the front of the string. So, think about using the
strip function to remove them.
Furthermore, the dot of missing numeric variables is converted into a dot in the character and not a blank. To avoid the issue, you can apply the put function using conditions (
if for example).
Don’t forget to define the length of any new character variable to avoid having unexpected truncated values.
How do you remember that the
input function uses an informat? Well both input and informat words start with in. The
put function does not start with in. Therefore it doesn’t use informat but format.
Bonus : how to add zeros at both ends of the text generated with the put function? You can use the z. format to replace spaces with zeros.
In the example below, the new variable will be 8 characters long. The last two digits will be for the decimal part. As there is only a single decimal places, a zero will be added at the end. It remains two blanks at the front which will be replaced by zeros.
data num_char_zero; z_num = 123.1; length z_char $8; z_char = put(z_num,z8.2); *z_char = '00123.10'; run;
3. Going from a character variable to another character variable using the
The original character value can also be a string and not a number. In this case, the new variable takes the value of a character format frequently defined by the programmer.
proc format; value $cntry FR = 'France' LU = 'Luxembourg' CH = 'Suisse'; run; data char_to_char; a1_char='LU'; a2_char=put(a1_char,$cntry.); *a2_char='Luxembourg'; run;
4. The Question
How do you deal with variables where all the values don’t match the same informat?
data example; length x $10; x='21JAN2015'; output; x='JAN2015'; output; x='January'; output; x='20150121'; output; run;
In this example, data of the
x variable are displayed differently. Some of them are full dates (day, month and year) and some are not. If I assume that the only valid date is of the form DDMMMYYYY then I can use the following code to create the SAS date:
data example_error;*(where=(input(x,date9.)); set example; x_num=input(x,date9.); run;
NOTE is displayed in the log. The
_ERROR_=1. If the
input function in used in the
where option of the SAS dataset then an
NOTE: Invalid argument to function INPUT at line ... column ... ... _ERROR_=1 _N_=... ERROR: INPUT function reported 'ERROR: Invalid ... value' while processing WHERE clause.
To avoid this, you simply need to add two question mark at the front of the informat name.
data example_noerror;*(where=(input(x,??date9.)); set example; x_num=input(x,??date9.); run;
The double question mark at the format of the informat name stops having the NOTE in the log. Of course, this is only of interest if you have good reasons to ignore this message. Otherwise, don’t use than and check your data.
Now, to expend the septum of valid values, you can use other kind of informats like
anydtdte or create your own one.
anydtdte informat, it isn’t just 21JAN2015 which is converted into SAS date but also JAN2015 and 20150121. With respect to JAN2015, the stored SAS date refers to the 1st of January 2015. With
_ERROR_ appears int he log.
data exemple_sanserreur; set exemple; x_num=input(x,anydate.); format x_num date9.; run;
5. Expert Corner:
Let’s look into a scenarios used rarely but which still remains very powerful.
Here are the data:
data one; length test $3 fmt $6 result $10; test='AAA'; measurement=0; fmt='NY'; result='No'; output; test='AAA'; measurement =1; fmt='NY'; result='Yes'; output; test='BBB'; measurement =-1; fmt='LESSMORE'; result='Less'; output; test='BBB'; measurement =0; fmt='LESSMORE'; result='Normal'; output; test='BBB'; measurement =1; fmt='LESSMORE'; result='More'; output; run;
The objective is to get the same thing has what is currently available in the
For that, we need two numeric formats: NY and LESSOR referenced in the column
fmt fo the dataset
proc format; value ny 0='No' 1='Yes'; value lessor -1='Less' 0='Normal' 1='More'; run;
Now, we use the
putn fonction to create the
result2 variable. Instead of writing the format name in the function, we are gonna put the name of the variable which contains those format names.
data one; set one; length result22 $10; result2=putn(measurement,fmt); run;
The same principle applied to the
putc function. The only difference is that the formats are character formats ($).
data two; length test $3 measurement $1 fmt $3 result $3; test='CC'; measurement='N'; fmt='$NY'; result='No'; output; run; proc format; value $ ny 'N'='No' 'Y'='Yes'; run; data two; set two; length result2 $10; result2=putc(measurement,fmt); run;
In a dataset, we cannot have two variables with the same name. It is possible to correct an exiting variable (e.g. divide by ten a numeric variable, keep a substring of a character variable), but it is not possible to change the variable type or the length of the variable once those attributes are defined.
To get around this problem, you can rename the original variable in order to use the name for the new variable and delete the original variable at the end.
sql procedure, the same name can be used.
data one; x=1; run; data one (drop=x); set one (rename=(x=_x)); length x $3; x='ABC'; run;