2 SOLUTIONS POSTED AT BOTTOM
My code
data test;
extract_string = "<some string here>";
my_result1 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "A1M_PRE");
my_result2 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "AC2_0M");
my_result3 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "GA3_30M");
my_result4 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "DE3_1H30M");
run;
Desired results
Extract the number after _
but preceding M
in strings that have M
at the end. The result set should be:
my_result1 = ""
my_result2 = "0"
my_result3 = "30"
my_result4 = "30"
The following extract_string
values fail
"\.*(\d*)M\b\"
"\.*(\d*?)M\b\"
"\.*(\d{*})M\b\"
"\.*(\d{*?})M\b\"
"\.*(\d){*}M\b\"
"\.*(\d){*?}M\b\"
"\.*(\d+)M\b\"
"\.*(\d+?)M\b\"
"\.*(\d{+})M\b\"
"\.*(\d{+?})M\b\"
"\.*(\d){+}M\b\"
"\.*(\d){+?}M\b\"
"\.*(\d+\d+)M\b\"
Potential solutions which I would request help with
- Perhaps I just haven't tested the correct
extract_string
yet. Ideas? - Perhaps my
cat("s/&.*", extract_string, ".*$/$1/")
needs to be modified. Ideas? - Perhaps I need to use
prxpson(prxmatch(prxparse()))
instead ofprxchange
. How would that be formulated?
Links I've looked at but have not been able to successfully implement
https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
https://www.pharmasug.org/proceedings/2013/CC/PharmaSUG-2013-CC35.pdf
SAS PRX to extract substring please
extracting substring using regex in sas
Extract substring from a string in SAS
SOLUTIONS
Solution 1
The suffix in the cat
function and the extract_string
were modified.
data test;
extract_string = "?(?:_[^_r\n]*?(\d+)M)?$";
my_result1 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "A1M_PRE");
my_result2 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "AC2_0M");
my_result3 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "GA3_30M");
my_result4 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "DE3_1H30M");
run;
Solution 2
This solution uses the other prx
-family functions: prxparse
, prxmatch
, and prxposn
.
data have;
length string $10;
input string;
datalines;
A1M_PRE
AC2_0M
GA3_30M
DE3_1H30M
;
data want;
set have;
rxid = prxparse ('/_.*?(\d+)M\s*$/');
length digit_string $8;
if prxmatch (rxid, string) then digit_string = prxposn(rxid,1,string);
number_extracted = input (digit_string, ? 12.);
run;