I am trying to use the SAS PRX function to extract a substring from my dataset. But it only returns the exact matches, whereas I need it to be more flexible and extract those that match a variety of conditions.
I have copied my data below. As you can see, one of the variables in my data is "brandmodel" which contains both the brand name and the model# of a particular camera. I need to have a separate column just for the model#s. So I am using the PRX function to extract them as they usually follow one of the following patterns:
For example: JX100 or JX10 or JX1 (i.e., 1-2 alphabets, followed immediately by 1-3 digits. This my program (copied below the data) can handle. But where I run into problems is: how to extract those model#'s where the alphabets are separated from the digits by a space or a hyphen, and how do I extract those into the same column "Model" as those with them altogether? Also, some of the observations do not have model#s, how can I get them to be set to missing instead of being dropped altogether?
Brandmodel|Price
iTwist F124 Digital Camera -red|49.00
Vivitar IF045 Digital Camera -Blue|72.83
Liquid Image Underwater Camera Mask|128.00
Impact Series Video Camera MX Gogglesâ„¢|188.00
Olympus VR 340 Silver|148.00
Olympus TG820 Digital Camera Black|278.00
Olympus VR 340 16MP 10x 3.0 LCD Red|148.00
Vivitar VX137-Pur Digital Camera|39.00
Olympus SZ-12 Digital Camera -Black|198.00
Olympus VG160 Digital Camera Red|98.00
Olympus VR340 Purple|148.00
Olympus TG820 Digital Camera Silver|298.00
Olympus TG820 Digital Camera Blue|278.00
Olympus VG160 Digital Camera Orange|98.00
Olympus TG820 Digital Camera Red|298.00
Fujifilm FinePix AX500 Red|78.63
Canon A2300 Silver|98.63
Canon A810 Red|75.00
Nikon Coolpix S2600 Digital Camera - Red|88.00
Nikon Coolpix L25 Digital Camera - Silver|82.00
Casio Exilim ZS10BK|128.00
Olympus TG-310 14 MP blue Digital Camera|148.00
Hipstreet Kidz Digital Camera - Blue|14.93
Casio Exilim ZS10PK|128.00
Olympus TG-310 14 MP Digital Camera orange|148.00
SAS Program
data walnov21p2;
length brandmodel $ 80;
infile "G:\File2\data\store_nov21\storenv21p2.csv" firstobs=2 dlm="|" dsd;
input brandmodel price;
re= prxparse('/[[:alpha:]]{1,3} \d{1,4}/');
if prxmatch(re, brandmodel) then
do;
model=prxposn(re, 0, brandmodel);
output;
end;
run;