0

I'm trying to use CURL and SAS 9.3 to download XML data. I have code that correctly specifies the URL for CURL (thanks again RP) and returns an XML file. Unfortunately, there are a bunch of junk characters which precede the XML data and are causing an error. Normally, I would create a string to parse out the junk, but the ~32K length restriction in SAS will become an issue if I increase volume. Does anyone know how to remove junk data from an infile without using a string variable? Or perhaps doing it with a string variable but circumventing the ~32K restriction? Ultimately, I want to use an XML map to parse the data to get it in a SAS file format. Thanks very much.

Code:

%let query= %sysfunc(urlencode(SELECT * FROM yahoo.finance.option_contract WHERE symbol='YHOO'));
%let env  = %sysfunc(urlencode(store://datatables.org/alltableswithkeys));
%let url  = https://query.yahooapis.com/v1/public/yql?q=&query^%nrstr(&)diagnostics=true^%nrstr(&)env=&env;

filename curl pipe "d:\sasdev\common\bin\curl\curl.exe -k &url 2>&1";

data _null_;
  infile curl lrecl=32767;
  input;
  put _infile_;
run;

Infile:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1092    0
1092    0     0   1591      0 --:--:-- --:--:-- --:--:--  1591<?xml version="1.0" encoding="UTF-
8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2014-07
-21T10:13:07Z" yahoo:lang="en-US"><diagnostics><url execution-start-time="0" execution-stop-time
="1" execution-time="1"><![CDATA[http://www.datatables.org/yahoo/finance/option_contract/yahoo.f
inance.option_contract.xml]]></url><publiclyCallable>true</publiclyCallable><url execution-start
-time="4" execution-stop-time="275" execution-time="271"><![CDATA[http://finance.yahoo.com/q/op?
s=YHOO]]></url><javascript execution-start-time="3" execution-stop-time="285" execution-time="28
1" instructions-used="12243" table-name="yahoo.finance.option_contract"></javascript><user-time>
285</user-time><service-time>247</service-time><build-version>0.2.2635</build-version></diagnost
ics><results><option symbol="YHOO"><contract>2014-07</contract><contract>2014-08</contract><cont
ract>2014-09</contract><contract>2014-10</contract><contract>2015-01</contract><contract>2016-01
</contract></option></results></query><!-- total: 285 -->
<!-- engine4.yql.gq1.yahoo.com -->
user3806824
  • 45
  • 1
  • 11
  • What is the 'junk' you are talking about? Is it the table-looking thing with the download stats? – Joe Jul 21 '14 at 20:26
  • The XML data starts at . I believe that the preceding data is causing an error. Hence the term 'junk'. – user3806824 Jul 21 '14 at 20:57
  • Sounds more like a `curl` issue to me - you should be able to get that to go away. See http://stackoverflow.com/questions/7373752/how-do-i-get-curl-to-not-show-the-progress-bar for example? – Joe Jul 21 '14 at 21:08
  • Joe, thanks very much for your help. You're are correct. This is a CURL issue. I read the thread you referenced as well as some others and -ks (add the s) removes the header (aka junk data). Thanks again. – user3806824 Jul 22 '14 at 06:47

1 Answers1

0

Per Joe's suggestion that this was a CURL issue, I did some quick research and adding an "s" to the CURL command (-ks instead of -k) removes the header (aka junk data).

Revised code that works:

%let query= %sysfunc(urlencode(SELECT * FROM yahoo.finance.option_contract WHERE symbol='YHOO'));
    %let env  = %sysfunc(urlencode(store://datatables.org/alltableswithkeys));
    %let url  = https://query.yahooapis.com/v1/public/yql?q=&query^%nrstr(&)diagnostics=true^%nrstr(&)env=&env;

    filename curl pipe "d:\sasdev\common\bin\curl\curl.exe -ks &url 2>&1";

    data _null_;
      infile curl lrecl=32767;
      input;
      put _infile_;
    run;
user3806824
  • 45
  • 1
  • 11