1

I am trying to download from this URL:

http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2

with bash wget.

But, I need to manually press the link to right of "Download Historical Data Here"

Is there a way to do this in code from command line?

EDIT 1

Or from java would be great too.

ManInMoon
  • 6,795
  • 15
  • 70
  • 133

3 Answers3

1

I think you will need to write some code to accomplish this, using a html client library that supports Javascript, such as PhantomJS, as mentioned by the answers to this question.

Other options include Python's mechanize library, and some of things mentioned in this answer.

If you're looking for a headless browsing library in Java, I would take a look at HtmlUnit. I have not used it personally though, so I can't vouch for its stability or ease or use.

Community
  • 1
  • 1
merlin2011
  • 71,677
  • 44
  • 195
  • 329
0

You can't download it because the download is triggered through JavaScript. Better you download it on your normal computer and than upload it to an other server which gives you direct access to the file by HTTP. Than you can download it in command line.

kostja93
  • 3
  • 2
  • I am attempting to automate the process. What do you mean by "download it to your normal computer"? Do you mean do it manually? – ManInMoon Mar 06 '14 at 08:56
  • Yes I mean manually. If you want to automate this process you should look at the HTTP Requests your Browser sends. Maybe can than send equals requests. To analyse the HTTP Requests I recomend BurpProxy. – kostja93 Mar 06 '14 at 08:59
  • Wow, looks a bit techie for me! Is there a simpler product? – ManInMoon Mar 06 '14 at 09:02
  • If this is to techie for you I'm sorry to say that my way to get this problem fix is to complex for you. – kostja93 Mar 06 '14 at 09:07
0

Since I wanted to learn PhantomJS myself, I attempted it, but it seems that phantomjs is not mature enough to support this correctly. Since I had taken the time to understand how the link worked, here's a solution in php instead, which you should be able to copy and paste into Download.php and run from the command line, assuming you have php-cli installed. I hope it will also be useful as a sample to people in the future trying to script this kind of thing.

<?php

/**
  * Usage: php Download.php <URL> <FileName>
  * Example: 
  * php Download.php http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2 Output.zip
  */

// Configuration parameters
$post_url = 'http://www.histdata.com/get.php';
$init_url = $argv[1];
$filename = $argv[2];

$ch = curl_init ($init_url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 1);

$output = curl_exec ($ch);

// Pull out the cookies
preg_match('/^Set-Cookie:\s*([^;]*)/mi', $output, $m);
parse_str($m[1], $cookies);

// Get the POST parameters from the form.
$post_array = getPostArray($output);
$post_data = http_build_query($post_array);

$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
$header[] = "Content-Type: application/x-www-form-urlencoded";

$ch = curl_init ($post_url);
curl_setopt ($ch, CURLOPT_COOKIE, http_build_query($cookies)); 
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate'); 
curl_setopt($ch, CURLOPT_REFERER, 'http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2/HISTDATA_COM_ASCII_EURUSD_M1_201402.zip'); 

$output = curl_exec ($ch);
$fp = fopen($filename,'wb') or die('Cannot open file for writing!'. $filename);
fwrite($fp, $output);
fclose($fp);

function getPostArray($doc) {
    $dom_doc = new DOMDocument;
    if (! @$dom_doc->loadhtml($doc))
    {
        die('Could not load html!');
    }
    else
    {
        $xpath = new DOMXpath($dom_doc);

        foreach($xpath->query('//form[@name="file_down"]//input') as $input)
        {
            //get name and value of input
            $input_name = $input->getAttribute('name');
            $input_value = $input->getAttribute('value');
            $post_items[$input_name] = $input_value;
        }
        return $post_items;
    }
}
?>
merlin2011
  • 71,677
  • 44
  • 195
  • 329