5

Following on from this question, regarding accessing a PDF on a web page using Matlab which is originally buried behind a Javascript function. I now have a URL which allows me to access the page directly, this works okay using the Matlab webrowser object (the PDF appears on screen), but to save the PDF for subsequent processing I appear to need to use the Matlab urlread/urlwrite functions. However, these functions provide no method for offering authentication credentials.

How do I provide username/password for Matlab's urlread/urlwrite functions?

Community
  • 1
  • 1
Ian Hopkinson
  • 3,412
  • 4
  • 24
  • 28

5 Answers5

6

Matlab's urlread() function has a "params" argument, but these are CGI-style parameters that get encoded in the URL. Authentication is done with lower-level HTTP Request parameters. Urlread doesn't support these, but you can code directly against the Java URL class to use them.

You can also use a Sun's sun.misc.BASE64Encoder class to do the Base 64 encoding programmatically. This is a nonstandard class, not part of the standard Java library, but you know that the JVM shipping with Matlab will have it, so you can get away with coding to it.

Here's a quick hack showing it in action.

function [s,info] = urlread_auth(url, user, password)
%URLREAD_AUTH Like URLREAD, with basic authentication
%
% [s,info] = urlread_auth(url, user, password)
%
% Returns bytes. Convert to char if you're retrieving text.
%
% Examples:
% sampleUrl = 'http://browserspy.dk/password-ok.php';
% [s,info] = urlread_auth(sampleUrl, 'test', 'test');
% txt = char(s)

% Matlab's urlread() doesn't do HTTP Request params, so work directly with Java
jUrl = java.net.URL(url);
conn = jUrl.openConnection();
conn.setRequestProperty('Authorization', ['Basic ' base64encode([user ':' password])]);
conn.connect();
info.status = conn.getResponseCode();
info.errMsg = char(readstream(conn.getErrorStream()));
s = readstream(conn.getInputStream());

function out = base64encode(str)
% Uses Sun-specific class, but we know that is the JVM Matlab ships with
encoder = sun.misc.BASE64Encoder();
out = char(encoder.encode(java.lang.String(str).getBytes()));

%%
function out = readstream(inStream)
%READSTREAM Read all bytes from stream to uint8
try
    import com.mathworks.mlwidgets.io.InterruptibleStreamCopier;
    byteStream = java.io.ByteArrayOutputStream();
    isc = InterruptibleStreamCopier.getInterruptibleStreamCopier();
    isc.copyStream(inStream, byteStream);
    inStream.close();
    byteStream.close();
    out = typecast(byteStream.toByteArray', 'uint8'); %'
catch err
    out = []; %HACK: quash
end
Andrew Janke
  • 23,508
  • 5
  • 56
  • 85
  • Neat - I hadn't realised it was so straightforward to do the Base64 coding as well. Plans on hold since I don't think the sysadmin appreciated my unorthodox access methods - now instead of a pdf file I get a "Don't do that" HTML page! Which is fair enough really, switching to diplomacy mode :oops: – Ian Hopkinson Aug 24 '09 at 18:37
2

As an update to this: another option is the new function webread which you can explicitely give the username and password for basic authenticaition.

options = weboptions('Username','user','Password','your password');
data = webread(url, options);

This can also be used for websave or webwrite. More info on weboptions here

ryan.d.williams
  • 620
  • 5
  • 14
1

urlwrite_auth is the next step, so here it is...

function  [output,status]=urlwrite_auth(url, user, password,location,wanted) 
%URLWRITE_AUTH Like URLWRITE, with basic authentication 
% 
% location is where you want the file saved
% wanted is the name of the file you want
% Returns the output file which is now saved to location. 
% 
% Examples: 
% sampleUrl = 'http://browserspy.dk/password-ok.php'; 
% [output,status] = urlwrite_auth(sampleUrl, 'user', 'password', location, wanted); 


% Matlab's urlread() doesn't do HTTP Request params, so work directly with Java 
jUrl = java.net.URL(url); 
conn = jUrl.openConnection(); 
conn.setRequestProperty('Authorization', ['Basic ' base64encode([user ':' password])]); 
conn.connect()
%note this calls the function below

% Specify the full path to the file so that getAbsolutePath will work when the
% current directory is not the startup directory and urlwrite is given a
% relative path.
file = java.io.File(location);

% the path.
try
    file = file.getCanonicalFile;
catch
    error('MATLAB:urlwrite:InvalidOutputLocation','Could not resolve file    "%s".',char(file.getAbsolutePath));
end

% Open the output file.
pathy=strcat(location,'\',wanted);
try
    fileOutputStream = java.io.FileOutputStream(pathy);
catch
    error('MATLAB:urlwrite:InvalidOutputLocation','Could not open output file "%s".',char(file.getAbsolutePath));
end

% Read the data from the connection.
try
    inputStream = conn.getInputStream;
        import com.mathworks.mlwidgets.io.InterruptibleStreamCopier; 
    % This StreamCopier is unsupported and may change at any time.
    isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;   
    isc.copyStream(inputStream,fileOutputStream);
    inputStream.close;
    fileOutputStream.close;
    output = char(file.getAbsolutePath);
catch
    fileOutputStream.close;
    delete(file);
    if catchErrors, return
    else error('MATLAB:urlwrite:ConnectionFailed','Error downloading URL. Your network     connection may be down or your proxy settings improperly configured.');
    end
end

status = 1;


function out = base64encode(str) 
% Uses Sun-specific class, but we know that is the JVM Matlab ships with 
encoder = sun.misc.BASE64Encoder(); 
out = char(encoder.encode(java.lang.String(str).getBytes())); 
%this is the bit of code that makes it connect!!!!

Note this is a development of the answer by Andrew to download files from a http username and password site.

Kerry
  • 11
  • 1
0

It turns out the intranet site is using basic authentication, which isn't supported by Matlab out-of-the-box but there is a workaround solution described on the Mathworks site here which works fine. In the first instance I used Firebug to get me the Base64 encoded string I needed for access, but I also did a direct calculation using the tool here. I have now saved my PDF report file to disk - so job done. For my next trick I will be converting it into text...

My understanding is that the get and post methods are distinct from the basic authentication method, but that basic authentication is not often used on the open net.

Ian Hopkinson
  • 3,412
  • 4
  • 24
  • 28
  • If you end up going with their workaround long term, consider copying urlread() and urlreadwrite() out to a separate directory and renaming them. Modifying your Matlab installation can be risky, even if it's MathWorks telling you to do it, and you'd need to do it to each separate install. – Andrew Janke Aug 24 '09 at 17:04
-2

I don't know matlab, this is just an educated guess.

The function documentation here lists the options as so:

s = urlread('url','method','params')

Depending on what kind of authentication they use this may or may not work, you are going to want to use a post method.

// Params is supposed to be a "cell array of name/value pairs, I don't know matlab... 
s = urlread('http://whatever.com','post', {'username' 'ian'; 'password' 'awesomepass'})

You will have to look at the actual request HTML form or view the net tab in firebug to see what the actual name's/values of he user name and password parameters are.

Brian Gianforcaro
  • 26,564
  • 11
  • 58
  • 77