0

so this is where I realize the difference between theory and practice. Because while I can theoretically picture how it should be/look I can't for the life of me actually figure out how to actually do it. I have tens of thousands of observations that look like this:

>+--------+-------------------------------+--+
>|   ID   |             CALLS             |  |
>+--------+-------------------------------+--+
>| 162743 | BAD DVR-3|NO PIC-1            |  |
>|  64747 | NO PIC-1|BOX HIT-4|PPV DROP-1 |  |
>+--------+-------------------------------+--+

And the end results should be something like this:

+--------+---------+--------+---------+----------+--+
|   ID   | BAD DVR | NO PIC | BOX HIT | PPV DROP |  |
+--------+---------+--------+---------+----------+--+
| 162743 |       3 |      1 |       0 |        0 |  |
|  64747 |       0 |      1 |       4 |        1 |  |
+--------+---------+--------+---------+----------+--+

I'm using PLSQL passthru in SAS so if I need to do transposing I can also always use proc transpose. But getting to that point is quite honestly beyond me. I know I will probably have to create a function likie the question asked here:T-SQL: Opposite to string concatenation - how to split string into multiple records

Any ideas?

Community
  • 1
  • 1
jswtraveler
  • 345
  • 6
  • 19
  • Tom Kyte has a string to table function that will help here:http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:110612348061 – James Feb 07 '14 at 13:23

2 Answers2

0

I can think of at least two ways to achieve this: 1. Read the entire data from SQL into SAS. Then use DATA STEP to manipulate the data i.e., convert data that is in two columns:

>+--------+-------------------------------+--+
>|   ID   |             CALLS             |  |
>+--------+-------------------------------+--+
>| 162743 | BAD DVR-3|NO PIC-1            |  |
>|  64747 | NO PIC-1|BOX HIT-4|PPV DROP-1 |  |
>+--------+-------------------------------+--+

to something that looks like this: result of DATA STEP manipulation:

ID      CALLS       COUNT
162743  BAD_DVR       3
162743  NO_PIC        1
64747   NO_PIC        1
64747   BOX_HIT       4
64747   PPV_DROP      1

From then it would be a simple matter of passing the above dataset to PROC TRANSPOSE to get a table like this:

+--------+---------+--------+---------+----------+--+
|   ID   | BAD DVR | NO PIC | BOX HIT | PPV DROP |  |
+--------+---------+--------+---------+----------+--+
| 162743 |       3 |      1 |       0 |        0 |  |
|  64747 |       0 |      1 |       4 |        1 |  |
+--------+---------+--------+---------+----------+--+
  1. If you want to do everything in pass-through SQL then that to should be easy IF the no. of categories such as {BAD DVR, NO PIC, BOX HIT etc...} are small.

The code will look like:

SELECT
ID
,CASE WHEN SOME_FUNC_TO_FIND_LOCATION_OF_SUBSTRING(CALLS, 'BAD DVR-')>0 THEN <SOME FUNCTION TO EXTRACT EVERYTHING FROM - TO |> ELSE 0 END AS BAD_DVR__COUNT
,CASE WHEN SOME_FUNC_TO_FIND_LOCATION_OF_SUBSTRING(CALLS, 'NO PIC-')>0 THEN <SOME FUNCTION TO EXTRACT EVERYTHING FROM - TO |> ELSE 0 END AS NO_PIC__COUNT
,<and so on>
FROM YOUR_TABLE

You just need to look string manipulation functions available in your database to make everything work.

0

Do you have any reference material that describes all the possible values for those PIPE delimited values in the CALLS column? Or do you already know the particular values you need to keep and can ignore others?

If so, you can just process the entire thing in a data step; here is an example:

data have;
   input @1 ID 6. @9 CALLS $50.;
   datalines;
162743  BAD DVR-3|NO PIC-1            
 64747  NO PIC-1|BOX HIT-4|PPV DROP-1 
run;

data want;
   set have; /* point to your Oracle source here */ 
   length field $50;
   idx = 1;
   BAD_DVR = 0;
   NO_PIC = 0;
   BOX_HIT = 0;
   PPV_DROP = 0;
   do i=1 to 5 while(idx ne 0);
      field = scan(calls,idx,'|');
      if field = ' ' then idx=0;
      else do;
         if      field =: 'BAD DVR'  then BAD_DVR  = input(substr(field,9),8.);
         else if field =: 'NO PIC'   then NO_PIC   = input(substr(field,8),8.);
         else if field =: 'BOX HIT'  then BOX_HIT  = input(substr(field,9),8.);
         else if field =: 'PPV DROP' then PPV_DROP = input(substr(field,10),8.);
         idx + 1;
         end;
      end;
   output;
   keep ID BAD_DVR NO_PIC BOX_HIT PPV_DROP;
run;

The SCAN function steps through the CALLS column by token; The ":=" operator is "begins with", and the SUBSTR function with only two parameters finds the characters following the hyphen to be read by the INPUT function.

Of course, I'm making a few assumptions about your source data but you get the idea.

BellevueBob
  • 9,498
  • 5
  • 29
  • 56
  • The idea is that rather than appending new columns to a table if there are new comments/issues we can simply add them to the end of the character string. So short answer is yes I will know the values I will want to keep. I think this solution will work great! Of course they haven't actually given me access to the early version of the table yet so i have to wait to try it. Thank you so much!!! – jswtraveler Feb 11 '14 at 19:42