0

My input data:

Input data

Preferred output data:

Preferred output

My best try:

Which is wrong because it only includes idnumber 2 and 4.

My best try

Data:

    DATA WORK.transpose_csv;
LENGTH
    idnumber           8
    start_end        $ 5
    date               8 ;
FORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INFORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INPUT
    idnumber         : ?? BEST1.
    start_end        : $CHAR5.
    date             : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;

      


    

My best try:

    proc transpose data=transpose_csv
                   out =wide;
                   by idnumber;
                   id start_end ;
    run;

As shown by this post it can be easily done in R, but I need to do this in SAS: Spread with duplicate identifiers (using tidyverse and %>%)

Community
  • 1
  • 1
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79

1 Answers1

1

The problem with proc transpose here is that you can have multiple events for a particular idnumber. If you are able to change the source data to add an extra id variable, e.g. event_id, then it would make the task much easier.

You can either continue with proc transpose as below, followed by a data step to bring the start / end dates on 1 row, or just do it in a single data step and hard code some values. There are other methods as well, such as a hash solution that would probably work well for this type of problem.

Edit : Added a 3rd method that first creates an event_id, which makes the subsequent proc transpose easy

/* source data */
DATA WORK.transpose_csv;
LENGTH
    idnumber           8
    start_end        $ 5
    date               8 ;
FORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INFORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INPUT
    idnumber         : ?? BEST1.
    start_end        : $CHAR5.
    date             : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;

/* method1 */
proc transpose data=transpose_csv
               out =wide1 (drop=_: start_end);
               by idnumber start_end notsorted;
               id start_end ;
run;

data wide2;
set wide1;
by idnumber;
retain _start;
if not missing(start) then _start=start;
if not missing(end) or last.idnumber then do;
        start=_start;
        output;
        end;
drop _start;
run;


/* method2 */
data wide3;
set transpose_csv;
by idnumber;
retain start;
format start end yymmdd10.;
if start_end='start' then start=date;
if start_end='end' then do;
    end=date;
    output;
    end;
else if last.idnumber then output;
drop start_end date;
run;

/* method3 */
data transpose_csv1;
set transpose_csv;
by idnumber;
if first.idnumber then event_id=0;
event_id+(start_end='start');
run;

proc transpose data=transpose_csv1
                   out =wide4 (drop=_: event_id);
                   by idnumber event_id;
                   id start_end ;   
run;
Longfish
  • 7,582
  • 13
  • 19