0

I am using Parse.com's httpRequest to retrieve the source code of a website.

Code:

Parse.Cloud.define("extract_website_simple", function(request, response) 
{
    return Parse.Cloud.httpRequest({url: 'http://bet.hkjc.com/marksix/index.aspx?lang=en' }).then
    (function(httpResponse) 
    {           
        response.success("code=" + httpResponse.text);
        }, 
        function (error) 
    {
        response.error("Error: " + error.code + " " + error.message);
        });
});

Question:

The html code cannot be retrieved. Instead, a ParseException, after loading 10 seconds, is appeared, written as follows:

com.parse.ParseException: i/o failure: java.net.SocketTimeoutException: Read timed out

How could I retrieve it properly without timeout? It seems there is no way to increase the timeout length?

Thanks!

pearmak
  • 4,979
  • 15
  • 64
  • 122

1 Answers1

0

As it it underlined by Parse support in many places like official Q/A, timeouts are low and they are not gonna be changed to keep good performance. Quote:

Héctor Ramos: I think that only two operations can run at any time in Cloud Code, so when you send three queries in parallel, the third one won't start until at least one of the first two has finished. Cloud Functions are not the best tool for long-running operations, and so they are limited to 15 seconds to keep Cloud Code performant for everybody. A better solution for long-running operations should be available shortly.

Official documentation says:

Resource Limits -> Timeouts

Cloud functions will be killed after 15 seconds of wall clock time. beforeSave, afterSave, beforeDelete, and afterDelete functions will be killed after 3 seconds of run time. If a Cloud function or a beforeSave/afterSave/beforeDelete/afterDelete function is called from another Cloud Code call, it will be further limited by the time left in the calling function. For example, if a beforeSave function is triggered by a cloud function after it has run for 13 seconds, the beforeSave function will only have 2 seconds to run, rather than the normal 3 seconds.

So even if pay them thousands of dollars every month they won't allow your function to run more than 10-15 seconds. Parse is not a tool for everything and is very specific. I meet limitations all the time, like lack of support multipart forms with many attachments.

Parse.Cloud.job

In order to support max 15 minutes with Parse request you can choose to work with Background Jobs. They support Promises with .then which highly conserves server resources over typical anonymous callbacks.

If you use Free edition you won't love another limit: Apps may have one job running concurrently per 20 req/s in their request limit, so you can run only single Background Job in your app and if you try to open another one: Jobs that are initiated after the maximum concurrent limit has been reached will be terminated immediately. To get 4 background jobs running you will have to pay $700/m with current pricing.


If you need more time or have less money to parse tens of pages at once you can choose different technology to support web scraping. There are many options, personally my favorites are:

Node.js

If you like JavaScript on server side you could try Node.js. To start from basics you could follow schotch.io tutorial.

PHP

Another alternative with thousands of examples is PHP. You could start with tutorial-like answer on stackoverflow itself.

s3m3n
  • 4,187
  • 1
  • 28
  • 24