0

I've been making a memberlist for a game and getting some data from hiscores. First I get a list of names, then I insert those to my database, then I give those to cURL to get the statistics from the hiscores and after that I update those to my database.

The problem seems to be when I make the cURL request I manage to update around 30 names total before my host displays a 503 error (probably due to max execution time). However, I must be able to update more than that. I'd say 100 would be the minimum.

I've tried to optimize the code so it would run faster with some success. It seems around 30 people is the maximum I can update in one query.

Is there something wrong with the code itself why it is taking so long? Below is the cURL part of the code and it's probably not the prettiest you've seen. I would assume cURL is able to handle way more data in one go and I had similar solution before without database working fine. Could the reason be https? Previously it wasn't needed but now it is.

<?php
$ch = curl_init();
if(isset($_POST['submit'])){ //check if form was submitted
$conn = new mysqli($servername, $username, $password, $dbname);
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    } 
            //get users
    $stmt = $conn->prepare("SELECT m.name, m.id, m.group_id, p.field_1, g.prefix, g.suffix FROM members m INNER JOIN pfields_content p ON m.id = p.id INNER JOIN groups g ON g.g_id = m.group_id WHERE
    m.group_id = 1
    ");
    $stmt->execute();
    $result = $stmt->get_result();

    while($row = mysqli_fetch_array($result, MYSQLI_ASSOC)) {

    // add new member ID to database
    $conn = new mysqli($servername, $username, $password, $dbname);
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    } 
    $stmt = $conn->prepare("INSERT IGNORE INTO `table` (`member_id`, `name`, `dname`) VALUES ('".$row['member_id']."', '".$row['name']."', '".$row['field_1']."')");
    $stmt->execute();

        // dname
        if($row['field_1'] != '' || $row['field_1'] != NULL) {

            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
            curl_setopt($ch, CURLOPT_URL, "https://secure.runescape.com/m=hiscore_oldschool/index_lite.ws?player=".$row['field_1']);
            curl_setopt($ch, CURLOPT_HEADER, 0);

            // grab HTML
            $data = curl_exec($ch);
            $array = array();
            $array = explode(',', $data);

            //formula
            if (!empty($array[15]) && (is_numeric($array[15]))) {
                $level = ((round($array[13]/2, 0, PHP_ROUND_HALF_DOWN)+$array[9]+$array[7])/4) + (($array[3]+$array[5])*0.325);
                $level = number_format($level, 2);
                // if valid name, update
                $conn = new mysqli($servername, $username, $password, $dbname);
                if ($conn->connect_error) {
                    die("Connection failed: " . $conn->connect_error);
                } 
                $stmt = $conn->prepare("UPDATE table SET  
                member_id = '".$row['id']."',
                name = '".$row['name']."',
                cb = '".$level."' WHERE member_id = ".$row['id']."");
                $stmt->execute();
                $conn->close();
            }}}}
saVior
  • 27
  • 5

3 Answers3

0

Ok saw a few things worth mentioning:

1) Why can you only do so many? Here's the most probable culprit:

        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt($ch, CURLOPT_URL, "https://secure.runescape.com/m=hiscore_oldschool/index_lite.ws?player=".$row['field_1']);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        // grab HTML
        $data = curl_exec($ch);

You're making an external curl call for each one, which means you are at the mercy of that other site and how long it takes to resolve the call. You can add some echo's around the curl call to see how much time each call is making. But, sadly, you're probably not going to be able to get any more speed from your code, as you're dependent on the external process. This could be because of https, or just their system being overloaded. Like I said above, if you really want to know how long each is taking, add some echo's around it like:

        echo "About to curl runescape " . date("H:i:s");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt($ch, CURLOPT_URL, "https://secure.runescape.com/m=hiscore_oldschool/index_lite.ws?player=".$row['field_1']);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        // grab HTML
        $data = curl_exec($ch);
        echo "Done with call to runescape " . date("H:i:s");

The rest of your code doesnt seem like it would be an issue speed-wise. But:

2) Your connections are sort of messed up. You open a connection, and do a query. And then the while starts, and you open a second connection and do a query. And then if the right conditions are met, you open a third connection and do some work, and then close it. The original 2 connections are never closed, and the second connection actually gets opened multiple times since its in your loop. Why don't you just re-use the original $conn instead of opening a new connection each time?

3) Finally, if you need for your php file to run more than 60 seconds, add something like this to the top:

set_time_limit(0);

The above should effectively let the script run as long as you want. Though, something like the above is much better served running as a cronjob on the CLI rather than a long-running script through a browser.

R. Smith
  • 551
  • 4
  • 10
  • Thanks, this is the kind of feedback I appreciate. I've had the feeling the connections are messed up and there are a lot unnecessary ones so that is for sure something I need to clean. What comes to curl, I need to take a look at the echo's to see how long it is taking. Cronjob was something I looked into but didn't try going that way yet. – saVior Feb 11 '19 at 20:12
  • Once you create your first connection ($conn), you can reuse the same connection for all of your queries. Each one is an open connection to your database and consumes resources (memory, open files, etc). – R. Smith Feb 11 '19 at 20:14
  • Also, I updated my answer to give an example of printing out times before and after the curl call... – R. Smith Feb 11 '19 at 20:15
  • Thanks, I'll rework the connections and remove the unnecessary ones to see if that has any effect. I did try the echoing and it took around 40 seconds for 13 calls so around 3 seconds per call. That sounds normal and acceptable, at least to my ear. – saVior Feb 11 '19 at 20:28
  • 3 seconds per call doesnt sound too outrageous. And that puts 30 calls at about 90 seconds, which is also not too outrageous for a timeout. – R. Smith Feb 11 '19 at 20:31
  • Now I am using only one connection but yes, it seems to be timeout I'm facing and set_time_limit(0); has no effect. I guess I need to try cronjob for the job – saVior Feb 11 '19 at 20:53
  • If the set_time_limit didnt help, your host is probably overriding it. – R. Smith Feb 11 '19 at 21:01
0

Other people seem to be doing an OK job figuring out why the code is so slow (you're doing a bunch of cURL requests, and each one takes time), and some other problems with the code (your indentation is messed up; I didn't dig much deeper than that, sorry).

How can you fix the performance problem?

The answer here depends a bit on your needs: Do you need to send the processed data back to the original requestor, or just save it to the database?

If you're just saving it to the database:

Perform your DB lookups and everything you need to do besides the cURL requests, then spawn a separate system process that will do all the cURL requests (and save the data to the DB) asynchronously while you send back an "OK, we're working on it" response.

If you need to send this data back to the caller:

Perform all the cURL requests at the same time. I don't actually think this can be done in PHP(see curl_multi, below). In some other languages it's easy. The most brute-force approach would be to split off an asynchronous system process for each cURL request, and put PHP in a sleep/check loop until it sees that all of the child processes have written their results to the DB.

You'll encounter plenty of further gotchas as you start working with asynchronous stuff, and it's not at all clear that you're approaching the problem in the best way. That said, if you go down this road, I think the first function you're going to need is exec. For example, this would spawn an independent asynchronous process that will shout into the void forever (don't actually do this):

exec('yes > /dev/null &')

And finally, my own agenda: This is a great opportunity for you to move some of your execution out of PHP! While you could probably pull off everything you need just by using curl_multi, and there are even some options for bypassing cURL and building your own HTTP requests, I suggest using tools better suited to the task at hand.

Community
  • 1
  • 1
ShapeOfMatter
  • 991
  • 6
  • 25
0

I worked through your code and tried to restructure it in such a way that it made better use of the database connection and curl requests. As the target url for the curl requests is over HTTPS I modified the curl options to include certificate info and some other modifications which may or may not be required - I have no way to test this code fully so there may be errors!

  • The initial query does not need to be a prepared statement as it does not use any user supplied data so is safe.
  • when using prepared statements create them once only ( so not in a loop ) and bind the placeholders to variables if the statement was created OK. At that stage the variable does not need to actually exist ( when using mysqli at least - different in PDO )
  • only create one database connection - the poor database server was trying to create new connections in a loop so probably suffered as a result.
  • when the statement has been run it should be disposed of in order that a new statement can be created.
  • If you use prepared statements do not compromise the database by then embedding variables ( not user input in this case I know ) in the sql - use placeholders for parameters!

I hope the following helps though... I was able to do some testing using random names and not using any database calls ~ 6 users in 5seconds


<?php

    try{

        $start=time();
        $cacert='c:/wwwroot/cacert.pem'; # <-------edit as appropriate
        $baseurl='https://secure.runescape.com/m=hiscore_oldschool/index_lite.ws';

        if( isset( $_POST['submit'], $servername, $username, $password, $dbname ) ){

            /* should only need the one curl connection */
            $curl=curl_init();
            curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
            curl_setopt( $curl, CURLOPT_BINARYTRANSFER, true );
            curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
            curl_setopt( $curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" );
            curl_setopt( $curl, CURLOPT_HEADER, false );
            curl_setopt( $curl, CURLINFO_HEADER_OUT, false );
            curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, true );
            curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 2 );
            curl_setopt( $curl, CURLOPT_CAINFO, $cacert );
            curl_setopt( $curl, CURLOPT_MAXREDIRS, 10 );
            curl_setopt( $curl, CURLOPT_ENCODING, '' );


            /* only need the one db connection */
            $conn = new mysqli( $servername, $username, $password, $dbname );

            /* initial db query does not need to be a prepared statement as there are no user supplied parameters */
            $sql='select m.`name`, m.`id`, m.`group_id`, p.`field_1`, g.`prefix`, g.`suffix`
                    from members m 
                    inner join pfields_content p on m.`id` = p.`id`
                    inner join groups g on g.`g_id` = m.`group_id`
                    where m.`group_id` = 1';
            $res=$conn->query( $sql );
            if( $res ){

                /* create the prepared statement for inserts ONCE, outside the loop */
                $sql='insert ignore into `table` ( `member_id`, `name`, `dname` ) values ( ?,?,? )';
                $stmt=$conn->prepare( $sql );

                if( $stmt ){

                    /* bind the placeholders to variables - the variables do not need to exist YET in mysqli */
                    $stmt->bind_param('iss', $id, $name, $field_1 );

                    /* placeholder arrays for bits of the recordset */
                    $data=array();
                    $urls=array();

                    /* 
                        collect all the relevant player names into an array
                        and store info for use in INSERT query
                    */
                    while( $rs=$res->fetch_object() ){

                        if( !empty( $rs->field_1 ) ) {
                            $urls[ $rs->field_1 ]=(object)array( 
                                'name'  =>  $rs->name,
                                'id'    =>  $rs->id
                            );
                        }

                        $data[]=array( 
                            'name'      =>  $rs->name,
                            'id'        =>  $rs->id,    /* original code references `member_id` which does not exist in the recordset */
                            'field_1'   =>  $rs->field_1
                        );
                    }

                    /* now loop through $data to do the inserts */
                    foreach( $data as $obj ){
                        /* create/dimension the variables for the prepared statement parameters */
                        $name=$obj->name;
                        $id=$obj->id;
                        $field_1=$obj->field_1;

                        /* run the insert cmd */
                        $stmt->execute();
                    }

                    /* we should now be finished with the initial prepared statement */
                    $stmt->free_result();
                    $stmt->close();

                    /*
                        now for the curl calls... no idea how many there will be but this should be known
                        by sizeof( $urls )

                        Dependant upon the number you might opt to perform the curl calls in chunks or use
                        `curl_multi_init` ~ more complicated but perhaps could help.

                        Also need to define a new sql statement ~ which sort of does not make sense as it was
                        ~ do not need to update the `member_id`!
                    */
                    $sql='update `table` set `name`=?, `cb`=? where `member_id`=?';
                    $stmt=$conn->prepare( $sql );
                    if( $stmt ){
                        $stmt->bind_param( 'ssi', $name, $level, $id );

                        foreach( $urls as $player => $obj ){

                            $url = $baseurl . '?player=' . $player;

                            /* set the url for curl */
                            curl_setopt( $curl, CURLOPT_URL, $url );

                            /* execute the curl request... */
                            $results=curl_exec( $curl );
                            $info=(object)curl_getinfo( $curl );
                            $errors=curl_error( $curl );

                            if( $info->http_code==200 ){
                                /* curl request was successful */
                                $array=explode( ',', $results );
                                if( !empty( $array[15] ) && is_numeric( $array[15] ) ) {

                                    $level = ((round($array[13]/2, 0, PHP_ROUND_HALF_DOWN)+$array[9]+$array[7])/4) + (($array[3]+$array[5])*0.325);
                                    $level = number_format($level, 2);

                                    /* update db ~ use $obj from urls array + level defined above */
                                    $name=$obj->name;
                                    $id=$obj->id;

                                    $stmt->execute();
                                }
                            } else {
                                throw new Exception( sprintf('curl request to %s failed with status %s', $url, $info->http_code ) );
                            }
                        }// end loop

                        $stmt->free_result();
                        $stmt->close();
                        curl_close( $curl );

                        printf( 'Finished...Operation took %ss',( time() - $start ) );

                    }else{
                        throw new Exception( 'Failed to prepare sql statement for UPDATE' );
                    }
                }else{
                    throw new Exception( 'Failed to prepare sql statement for INSERT' );
                }
            }else{
                throw new Exception( 'Initial query returned no results' );
            }
        }
    }catch( Exception $e ){
        exit( $e->getMessage() );
    }
?>
Professor Abronsius
  • 33,063
  • 5
  • 32
  • 46