1

What's the best way of extracting a date-stamp with the format yyyymmdd from an existing string in SAS?

From what I read, the easiest way seems to be using regular expressions withing the prxmatch function to replace everything but the pattern.

  • Let's assume that the only 8-digit pattern in the string will be my date, so a [0-9]{8} pattern is sufficient.

Here is what i have for replacing 8-digit patterns with the string empty (for readability):

data b(keep=have want);

    /* HAVE */
   have = '&libroot_hr./import/sxh3j900/20190702_SXH3J900_DWH_Adresse.txt';
   regexp = 's/[0-9]{8}/empty/';
   times  = -1;  /* -1: replace all occurences */

    /* WANT */
   want = prxchange(regexp, times, have);

   /* Result
       '&libroot_hr./import/sxh3j900/empty_SXH3J900_DWH_Adresse.txt'
   */

run;

How can I change this, so it replaces everything but any 9-digit patterns with the string empty?

Martin Dreher
  • 1,514
  • 2
  • 12
  • 22
  • 1
    Can the strings have more than one string of 8 digits? Do the strings of 8 digits need to be independent of other digits? Or do you want to match 8 digits that are part of a longer string of digits? – Tom Jul 02 '19 at 13:59
  • Let's assume that the only 8-digit pattern in the string will be my date. The date-string will be surrounded by non-digit characters. There will be other numbers (`3` and `900` in my example), but no other strings of 8 or more consecutive numbers. Therefore `[0-9]{8}` should be sufficient to locate my string (but i need the inverse and locate everything *but* said string) – Martin Dreher Jul 02 '19 at 14:04
  • Then why not just pull out the date string and write it back to the variable? – Tom Jul 02 '19 at 14:12
  • You mean like the solution from **Aurieli**? I am not quite sure yet when to use which prx-function... will test it! – Martin Dreher Jul 02 '19 at 15:14

1 Answers1

1
data b(keep=have want);
   have='&libroot_hr./import/sxh3j900/20190702_SXH3J900_DWH_Adresse.txt';
   re = prxparse("/(\d{8})/"); 
   if prxmatch(re,have) then do;
      want = prxposn(re,0,have);
   end;
run;

If u want just extract the first 8-digits.

Here the regexp to yyyymmdd format.

Llex
  • 1,770
  • 1
  • 12
  • 27
  • thanks for your answer. will test it - looks promising on the first test... i will mark an answer as soon as i have it tested :) – Martin Dreher Jul 02 '19 at 15:15
  • @MartinDreher I advise you to use regexp: "/((19\d{2})|(2\d{3}))(\d{2})(\d{2})/", i think its better then "/(\d{8})/" :) – Llex Jul 02 '19 at 15:38
  • naturally, i only used "/(\d{8})/" for simpler code while trying to understand your solution. works like a charm btw. thanks mate! – Martin Dreher Jul 03 '19 at 10:51