0

I am new to this forum. But I have read many answers to questions that I had in SAS coding as well on this website. I have run into a problem with SAS coding at work that I hope somebody can help.

I am trying to extract a numeric substring from a text string. The numeric string is always before words like "YR" or "YEAR". Sometimes there is a space between the numeric substring and "YR" or "YEAR". Both the numeric substring and the text string vary in length from obs to obs. Here is an example of what it looks like: Screenshot of SAS dataset

The number right before "YR" or "YEAR" is the numeric string I want to extract. I have tried to use find fn to locate where the "YR" or "YEAR" is and then use substrn to extract the surrounding string. Then compress the characters. But the result is not ideal as sometimes it pulls the number in the first part of the string and sometimes it doesn't pull in the whole number (e.g. 4.75). Here is the code that I have used:

if find(deal_type_oss, "YR","i") ne 0
then term=compress(substrn(deal_type_oss, find(deal_type_oss, "YR","i")-4,6),"","a");
if find(deal_type_oss,"Year","i") ne 0 
then term=compress(substrn(deal_type_oss, find(deal_type_oss, "Year","i")-4,6),"","a"); 

Here is the result of this code: Results of the code

Thank you in advance!

Tao

Tao Li
  • 3
  • 4
  • 1
    Can you provide what the sample input data looks like, and what the desired output data should look like as well? – Robert Penridge Jan 08 '16 at 00:03
  • Sorry for the late reply. But Shenglin below has solved my problem. But thank you so much for leaving a comment. – Tao Li Jan 09 '16 at 02:25

2 Answers2

1

Try to use look ahead regular expression. Here,\s means space, \S+ means any more than one none space character, \s? means possible space, ?= is equal to YR or YEAR behind first regular expression.

data have;
input string & $200.;
year=prxchange('s/.*\s(\S+\s?)(?=YR|YEAR).*/$1/',-1,string);
DATALINES ;
USD2.75BN 4.5YR REV
USD110MM 5YR REV
USD340MM 5YR REV
USE40MM 5YR REVOLVER
USD3.5BN 5YEAR REVOLVER
USD2BN 4YR REV
USD3.5BN 4.75 YEAR REVOLVER
CAD500MM REV 3YR EXP
CAD75MM 5YR REVOLVER
USD1BN 5YR REVOLVER
;
RUN ;
Shenglin Chen
  • 4,504
  • 11
  • 11
  • Hi Shenglin. Thank you so much for the help and sorry for my late reply. I have tried your code and it worked like magic. Thank you again! – Tao Li Jan 09 '16 at 02:26
0

Ugly, but works.

DATA _NULL_ ;
  INPUT ;
  deal_type_oss = _INFILE_ ;

  pastchar = INDEX(deal_type_oss,'YR') ; %* find the first character AFTER the year number ;
  IF NOT pastchar THEN
    pastchar = INDEX(deal_type_oss,'YEAR') ;

  temp_string = SUBSTR(deal_type_oss,1,pastchar-1) ; %* trim down to end with the year number ;

  prechar = ANYALPHA(REVERSE(TRIM(temp_string))) ; %* Find the last character BEFORE the year number ;

  numchar = SUBSTR(TRIM(temp_string),LENGTH(TRIM(temp_string))-prechar+2) ; %* Grab from that character before the year ;
  PUT deal_type_oss= numchar= ;
  DATALINES ;
USD2.75BN 4.5YR REV
USD110MM 5YR REV
USD340MM 5YR REV
USE40MM 5YR REVOLVER
USD3.5BN 5YR REVOLVER
USD2BN 4YR REV
USD3.5BN 4.75 YEAR REVOLVER
CAD500MM REV 3YR EXP
CAD75MM 5YR REVOLVER
USD1BN 5YR REVOLVER
;
RUN ;
Ludwig61
  • 141
  • 1
  • 7
  • Thank you so much for your answer and sorry for my late reply. I haven't had a chance to try your code today but will do. However, I have had a chance to try Shenglin's code above and it worked. I will also try your code. And if it works, I will also mark it as an answer (hopefully that is allowed). Thank you again! – Tao Li Jan 09 '16 at 02:29