1

I am trying to populate a bunch of records with "random" data, all for the purposes of testing an apps usability with a large amount of data.

So I though - perfect!, I've got my methods for inserting, it does some 'duplicate checking', and all that. All I have to do is loop it x times. And populate with random values from the db.

So I hit the DB once for each set of data (three things in my example, users, thing1 and thing2) I do the loop, and set the values I want to insert randomly - and call my insert...

It works fine, until around the 80th loop then it starts slowing, until right around 180, the CPU fans go full on, and CF is spent... I've tried adding a 'sleep' command at the end of each loop, to give 'whatever' a chance to resolve, do it's garbage collection or whatever.

Here is an error in the log - "java.lang.OutOfMemoryError : GC overhead limit exceeded"

Not a lot of luck. can anyone possibly give me some insight into this - I had an identical issue about 18 mo ago, dong some card hand analysis. I thought that was simply because the iterations on 'card hands' was too extensive, and I accepted that.

But this is basically 1000 insert statements, with a call and check for existing records before each... that shouldn't be a huge drain.?? should it? - Shouldn't each structure of data be cleared, at each loop? and each db connection be released when done?

I suspect the 'answer' is more theoretical and base than just a 'code' correction. But I'm really curious to know, because it'll only help make any future code more robust.

Thanks

Here's the basic loop structure.

// get master lists of all available data - will use for random selection
users  = createObject( 'component', 'users' ).getFullList;
thing1 = createObject( 'component', 'thing1' ).getFullList;
thing2 = createObject( 'component', 'thing2' ).getFullList;

for ( i = 1; i <= 1000; i++ ) {
    // get random id from within the recorset returned
    args = structNew();
    args['app_id'] = 1;
    args['user_id']   = users.id[RandRange( 1, users.recordcount , 'SHA1PRNG' )];
    args['thing1_id'] = thing1.id[RandRange( 1, thing1.recordcount , 'SHA1PRNG' )];
    args['thing2_id'] = thing2.id[RandRange( 1, thing2.recordcount , 'SHA1PRNG' )];

    // cehck for dups and put the data
    return = putData( argumentCollection = args );

    // tell me count and let me know if it succeeded
    writeoutput( '( #i# = #return# )' );

    // flush it - so I know where we are
    getPageContext().getOut().flush();

    // help garbage collection? something is causing a bog down
    structClear( args );
}

And here are the other methods called from objects instantiated before the loop. There are no 'excessive' components being created in the loop - all that happen once before the loop.

/*
    methods from other components
*/

putData( [my arguments] ) {
    var isDuplicate = false;

    transaction action = 'begin' {
        // check for existing (dont allow dups)
        isDuplicate = chekForDuplicate( argumentcollection = arguments );

        if ( !isDuplicate ) {
            // deduct from users balance
            deduct = deductFromBalance(
                                        user_id = arguments.user_id
                                        ,units = 1
                                    );

            if ( deduct.success ) {
                try {
                    // put in users acct
                    insertData( argumentcollection = arguments );
                    transaction action = 'commit';
                    return true;
                } catch ( any e ) {
                    transaction action = 'rollback';
                    return false;
                }
            } else {
                transaction action = 'rollback';
                return false;
            }
        } else {
            transaction action = 'rollback';
            return false;
        }
    }
}

chekForDuplicate( [my arguments] ) {
    // check for uplicate
    var r = new Query(
        datasource = "myDSN"
        ,sql = "
            SELECT
                id
            FROM
                eventTable                  
            WHERE
                app_id = ?
            AND
                user_id = ?
            AND
                thing1_id = ?
            AND
                thing2_id = ?
        "
        ,parameters = [
                 { value = val( arguments.app_id ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.user_id ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.thing1_id ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.thing2_id ), cfsqltype = 'CF_SQL_INTEGER' }
        ]
    ).execute().getPrefix();

    if ( r.recordcount ) {
        return true;
    } else {
        return false;
    }
}


deductFromBalance( [myArgs] ) {
    var r = {};
        r['beginningBalance'] = checkBalance( arguments.user_id );
        r['unit'] = arguments.units;
        r['success'] = false;

        if ( r['beginningBalance'] > 0 ) {
            adjustBalance( argumentCollection = arguments );
        }

        r['endingBalance'] = checkBalance( arguments.user_id );

        if ( r['beginningBalance'] - r['endingBalance'] == r['unit'] ) {
            r['success'] = true;
        }

    return r;
}

checkBalance( [myArgs] ) {
    // return the balance for this user
    var r = new Query(
            datasource = 'myDSN'
            ,sql = "
                SELECT
                    balance
                FROM
                    users
                WHERE
                    id = ?
            "
            ,parameters = [
                { value = val( arguments.user_id ), cfsqltype = 'CF_SQL_INTEGER' }
            ]
        ).execute().getResult();

    return val( r.balance );
}

adjustBalance( [myArgs] ) {
    var r = false;
    var bal = checkBalance( arguments.user_id );

    if ( bal >= 0 && arguments.units > 0 ) {
        var q = new Query(
            datasource = 'myDSN'
            ,sql = "
                UPDATE
                    users
                SET
                    balance = ( balance + ? )
                WHERE
                    id = ?
            "
            ,parameters = [
                 { value = val( arguments.units ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.user_id ), cfsqltype = 'CF_SQL_INTEGER' }
            ]
        ).execute();

        r = true;
    }
    return r;
}

insertData( [myArgs] ) {
    // insert data
    var r = new Query(
            datasource = "myDSN"
            ,sql = "
                INSERT INTO
                    eventTable
                    (
                        app_id
                        ,user_id
                        ,thing1_id
                        ,thing2_id
                    )
                VALUES
                    (
                        ?
                        ,?
                        ,?
                        ,?
                    )
            "
            ,parameters = [
                 { value = val( arguments.app_id ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.user_id ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.thing1_id ), cfsqltype = 'CF_SQL_INTEGER' }
                ,{ value = val( arguments.thing2_id ), cfsqltype = 'CF_SQL_INTEGER' }
            ]
        ).execute().getPrefix();

    return r;
}

All these methods work and return what I'm expecting, so if I typo'ed my 'abstract' please allow for that... it's the eating of memory or CPU that I'm searching for input on.

CF 11 update 5 on Win 7 - Core i7 4gig ram Maximum Output Buffer size 2048 Minimum JVM Heap Size 512 MB Maximum JVM Heap Size 1024 MB

Leigh
  • 28,765
  • 10
  • 55
  • 103
j-p
  • 3,698
  • 9
  • 50
  • 93
  • 1
    One debug technique is to start commenting out individual function calls, rerun, and see if your problem persists. So for example, first comment out the call to putData(). If the problem still persists, comment out the call to insertData. Rinse and repeat with other function calls. – Sam M Aug 18 '15 at 14:33
  • And this probably isn't related to your memory issue, but your deductFromBalance is not protected in a try/catch so any exception in there will mess up your transaction state. – Sam M Aug 18 '15 at 14:42
  • @sam, I've done that... insertData() seems to be the issue. And thx for try/catch suggestion – j-p Aug 18 '15 at 15:12
  • how much ram did you allocate to your CF instance? AFAIK CF does not release ram until the request is fully processed. – Henry Aug 18 '15 at 17:04
  • @jpmyob - Might want to look at [this thread](http://stackoverflow.com/questions/855066/is-there-a-solution-to-this-cfqueryparam-memory-leak). It is a bit dated and slightly different base case (bulk insert - which can be done far more efficiently with other tools), but some of the posts may be relevant. – Leigh Aug 18 '15 at 19:04
  • @leigh, ya, my next step is to wrie a method that'll handle a bulk insert... i was trying for existing tested code, but a one off bulk to populate for testing is a solution. if it were a process that had to run repeatedly, id be more inclined to keep hunting... but at this point, a work around is a solution. – j-p Aug 18 '15 at 23:53
  • @henry - interesting - I'll have to researcher that. I think you are asking for my JVM heap allocation... which i bumped to 1024 / 2048, and it didn't seem to help – j-p Aug 19 '15 at 00:00
  • @jpmyob - No matter you look at it, looping is almost always a sub-optimal - in any application. Especially when it comes to modifying data. Looping takes time. No way around it. In my experience, bulk inserting is often faster *and* simpler overall. Rather than applying complex logic to each record individually, you can take advantage of set based operations - which is where databases really shine. So tasks like [finding existing/duplicate records](http://stackoverflow.com/questions/10698115/parse-compare-data-using-coldfusion-mysql/10716267#10716267) are often a lot simpler and faster. – Leigh Aug 19 '15 at 00:36
  • *Looping takes time* .. and resources. While I realize the above does not directly your original question .. looping really is not the way to go for bulk data inserts IMO. CF is just not the right tool for that kind of task. – Leigh Aug 19 '15 at 01:06
  • If you have to use CF for the inserts, don't check for dupes. Create a temporary table and insert every record. That reduces the activity inside the loop. Once the records are in the database, use sql to do the rest. – Dan Bracuk Aug 19 '15 at 13:57

1 Answers1

1

You are doing too much work inside the loop. I would use this approach.

Before you start a loop, create a temporary table in your database. It has to be temporary because you have to acknowlege the possibility of simultaneous activity.

Inside your loop, simply create your values and insert them into your temporary table. While not ideal, inserting 1000 records using ColdFusion is not the end of the world. I successfully do worse.

Once the loop is finished, use sql to do what you need to do with duplicate records and other data issues. Then write to your real database table from the temporary one.

Dan Bracuk
  • 20,699
  • 4
  • 26
  • 43