4

This is my first time posting here. I greatly appreciate any and all guidance on this subject.

I'm trying to make a program that automatically fills in web forms and submits the data, returning the resulting page to the program so it can continue to 'browse' the page, allowing it to recursively submit even more data.

The main problems I'm having are:

  • The 'submit' button is coded in Javascript, so I don't know where the form data goes when making the page request.
  • I want to fill in the forms using data from an Excel table, so I need to be able to access data from outside the page.
  • I need to be able to navigate the resulting page to continue to submit more data.

More specifically, I'm trying to first login to the Practice Mate website, navigate to 'Manage Patients', hit 'Add Patients', and fill in the proper forms and submit. I'm filling in the forms from an Excel table thousands of rows long.
Sorry I can't be more clear on this without providing a username and password.

What I've been trying to do is use Javascript to make page requests from a page that retrieves information from the Excel document using PHP. I still can't seem to get anything to work with this method though.

I apologize for being a relative novice at this. Thanks in advance.

Baozi
  • 68
  • 1
  • 1
  • 7
  • Because there's some Javascript involved, you aren't going to be able to do this in PHP (as you've tagged this question). Have you considered writing this as a browser userscript or a browser extension? Also, their site TOS seems to prohibit screen-scraping, so be prepared to be actively blocked by them. – Charles Jan 07 '13 at 10:27
  • Why can't it be done in PHP? – Pastor Bones Jan 07 '13 at 10:28
  • @PastorBones, show me how to process Javascript inside HTML from within PHP and I'll change my statement. – Charles Jan 07 '13 at 10:29
  • That sounds like alot of work. Why not just use a network sniffer to determine how the form post is sent to the server and send it yourself using cURL? If you need a value from a javascript variable you could always parse the html and grab it before sending the form. I've done it plenty of times... – Pastor Bones Jan 07 '13 at 10:31
  • ok, looked at the login form. It's an aspx page. At a cursory glance it requires that a viewstate value be passed with the form data, which can be scraped from the CDATA in the page. – Pastor Bones Jan 07 '13 at 10:35
  • @PastorBones, can you be a bit more specific in how to do this (and maybe how you found that info)? What kind of "network sniffer" would you recommend? Thanks again for the responses. – Baozi Jan 07 '13 at 18:08
  • I use Chrome which displays the header information being sent when browsing or submitting forms (wrench -> Tools -> Developer Tools). If you're using FireFox, I think you have to install an extension to view http headers. – Pastor Bones Jan 08 '13 at 06:19
  • @Baozi I updated my answer with an example that should login to your particular website. – Pastor Bones Jan 08 '13 at 06:49

2 Answers2

7

You can use PHP cURL to browse & submit forms to websites, but it does depend on how the website is setup. Most have security checks in place to prevent bots and can be tricky to get everything to work right.

I spent a little bit of time and came up with this login script. Without a valid username and password I can't verify that it is successful, but should do what you need. This short example first browses to the page to set any cookies and scrape a __VIEWSTATE value needed to submit the form. It then submits the form using the username/password you provide.

<?php

// Login information
$username = 'test';
$password = 'mypass';
$utcoffset = '-6';
$cookiefile = '/writable/directory/for/cookies.txt';

$client = new Client($cookiefile);

// Retrieve page first to store cookies 
$page = $client -> get("https://pm.officeally.com/pm/login.aspx");
// scrape __VIEWSTATE value
$start = strpos($page, '__VIEWSTATE" value="') + 20;
$end = strpos($page, '"', $start);
$viewstate = substr($page, $start, $end - $start);

// Do our actual login
$form_data = array(
    '__LASTFOCUS' => '', 
    '__EVENTTARGET' => '',
    '__EVENTARGUMENT' => '',
    '__VIEWSTATE' => $viewstate,
    'hdnUtcOffset' => $utcoffset,
    'Login1$UserName' => $username,
    'Login1$Password' => $password,
    'Login1$LoginButton' => 'Log In'
);
$page = $client -> get("https://pm.officeally.com/pm/login.aspx", $form_data);

// cURL wrapper class    
class Login {
    private $_cookiefile;

    public function __construct($cookiefile) {
        if (!is_writable($cookiefile)) {
            throw new Exception('Cannot write cookiefile: ' . $cookiefile);
        }
        $this -> _cookiefile = $cookiefile;
    }

    public function get($url, $referer = 'http://www.google.com', $data = false) {
        // Setup cURL
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_REFERER, $referer);
        curl_setopt($ch, CURLOPT_COOKIEFILE, $this -> _cookiefile);
        curl_setopt($ch, CURLOPT_COOKIEJAR, $this -> _cookiefile);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_MAXREDIRS, 10);

        // Is there data to post
        if (!empty($data)) {
            curl_setopt($ch, CURLOPT_POST, 1);
            curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($data));
        }

        return curl_exec($ch);
    }

}
Pastor Bones
  • 7,183
  • 3
  • 36
  • 56
  • 1
    Thanks a lot for the detailed response! I've managed to get your example script to work (after some research into Guzzle and autoloading), but I'll have to do some more studying of cURL to properly figure out how to use it. If I have more trouble I'll be sure to post more in this thread. – Baozi Jan 09 '13 at 08:51
0

Well, I think the cURL will do the trick, the curl_init() handler is explicable enough. Still at the inception of the doc peruse, howbeit, good results are anticipated. Well, not too sure about the PHP flexibility of structures as that will mean a lot with cURL. Hope to find good luck down the line.

RBT
  • 24,161
  • 21
  • 159
  • 240