0

Im trying to import json data to mysql.My json file nearly 3.7Mb and have nearly 17k rows (test data real data will be nearly 65k rows ).But with my script its very slow and it takes nearly 8-9min.Is there any fast way to import json data to mysql with php progress interface ? And im trying to add progress bar feature and its works for now.

$veri=json_decode(file_get_contents('auctions.json'));

            $sayi=count($veri->alliance->auctions);
            $a=$veri->alliance->auctions;
            $yuzde=round($sayi/100);
            echo "<div id='tasiyici'>";
            $sql=$db->prepare("INSERT INTO auctions (id, auc, item, owner, bid, buyout, quantity, timeLeft) VALUES ('',?,?,?,?,?,?,?)");
            for ($i=0;$i<=$sayi;$i++){
                $sql->execute(array($a[$i]->auc,$a[$i]->item,$a[$i]->owner,$a[$i]->bid,$a[$i]->buyout,$a[$i]->quantity,$a[$i]->timeLeft));
                if($i%$yuzde=='0'){
                    $y=$i/$yuzde;

                    if(($y*4+4)>"180"){$pos=40-(($y*4+4)-180); $color="color:#fff";}
                    if(($y*4+4)>=220){$pos=0;}
                    echo "<div class='rakam' style='background-position:-".$pos."px 0;".$color."'>%".($y+1)."</div>";
                    echo "<div class='yuzde' style='width:".($y*4+4)."px;'></div>";
                    ob_flush();
                    flush();

                }


            }
            echo "</div>";

            echo "<br> $sayi data added.";

CSS Codes

<style>
        body {
            font-family:Arial;
        }
            #tasiyici {
                width:400px;
                height:17px;
                display: block;
                position: relative;
                margin:50px auto;
                background:#e3e3e3;
                border-radius:5px;
                overflow: hidden;
                border:1px solid #ccc;
            }
            .yuzde {
                height:17px;
                display: block;
                width:1px;
                background:url("progressOverlay.png");
                position: absolute;
                top:0;
                left:0;
                z-index:1;
            }
            .rakam {
                width:40px;
                height:16px;
                display: block;
                line-height:17px;
                position: absolute;
                left:50%;
                top:0;
                margin-left:-20px;
                z-index:9999;
                background:url("progressOverlay.png") -40px 0 #e3e3e3 no-repeat;
                font-size:11px;
            }

        </style>
  • 1
    You could try inserting more than one row per execute, but I'm sceptic if this will make a big difference with prepared statements. I would try inserting at least 1% of your data per execute. Reusing a `INSERT INTO` won't make much difference as your bottle neck is the amount of data. Another way is to convert your JSON-data and do some bulk import like http://stackoverflow.com/questions/2811096/mysql-bulk-insert-from-csv-data-files – Basti Mar 07 '12 at 12:07
  • 2
    So `$sayi` could be 65,000? If thats the case, thats 65,000 queries your running. Maybe instead try executing the queries in batches of 100 or 1,000. I'm not sure how to do it with prepare/execute, but I would just start off with `insert into table (keys) values (values),(values),(values)...` etc and keep appending `(values),` till you've got 100 or 1000 or whatever, and then execute that. Should be a bit faster because there'd be less mysql connections (depending on your setup). batches of 1000 will cut it down to 65 queries – Benno Mar 07 '12 at 12:08
  • 1
    Thank you guys i will try 1000 row for 1 query.I will add results here. –  Mar 07 '12 at 12:15

3 Answers3

1

9 minutes * 60 seconds = 540 seconds

17000 / 540 = ~ 30.5 records per second when inserting

Because you did not post your server configuration, load, ram etc. we cannot directly put the issue at a certain point unfortunately. Anyhow 30.5 inserts / sec is not much for a serious server. It is very doable certainly because your row size doesn't seem big.

What you need to do is do some serious measurements. For example lack of memory size will create issues. Also lots of expensive indexes on the table will slow down insertion quite hard.

If you do this lots of times (for example by user upload) it might be wise to create a queue for it since it will anyway take some time. Though 17k inserts should be doable in seconds not minutes.

There is a lot of documentation available about optimizing MySQL for inserts, this is a general overview of the influences at the speed: http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

At first sight it doesn't seem to be your script, that's not really special. Though I would seperate the process in 2 scripts:

  1. Handle the insertion in a process running in the background, like a cron, a constant running script etc. So upload the data, store it and the background process will handle that. Maybe create a table importjobs for it.

Every x records you could do something like this:

UPDATE importjobs SET counter = :amountimported WHERE id=:jobid
  1. A script giving you answer to the status. By getting the status messages written by the background script.

    SELECT counter/total AS partdone, counter, total FROM importjobs WHERE id=:jobid

That way you can measure and improve the process totally seperate from the user interface. Seperation of concerns happens. You give full speed to the import and you can seperate the update indicator totally from the process.

Depending on the speed you get you can decide whether you want to update every 5, 10, 20, 60 seconds or whatever. This lookup is quite cheap so you can do it quite some times. Mostly because that importjobs table is also very small.

Luc Franken
  • 2,994
  • 1
  • 17
  • 14
  • Sorry about lack information.Im working at local and 8-9min not exact time.For now i am trying to get json data from an api (http://eu.battle.net/auction-data/veknilash/auctions.json) and saving as a json file.Than reading and generating sql file and i will try to load to mysql.(Not succeed yet.) Is it true way to do that or do i need any other logic for it ? if i can succeed at this part i will try to do that with progress information. –  Mar 07 '12 at 13:52
  • That's ok, it just shows how you can measure. For example you can write a simple script inserting lots of rows to have an indication of speed. Generally I see on my development machines a performance much less compared to optimized servers we use in production. First measure then code, otherwise you might spend lots of time on the wrong things. – Luc Franken Mar 07 '12 at 15:15
1

I had a similar problem to solve, but I did not know either the number of records from an external API. I realised that there was a risk that my job could just fall over mid process and I didn't want to have to start again.

It strikes me that your issue is similar in that respect. What you are asking your database to do is try to insert a record perhaps even if the record exists already. That's costly.

What I did was build a control table that stored the key of the data or some unique identifyer. During processing the job reads the control database to determine if the record exists, if it does, just skip to the next record else insert it.

I created a record number limiter that I could test the load with (say limiting to the first 100 records) and with this tool I was able to run the php script in 5 open tabs in my browser to effectively create parallel loading.

T9b
  • 3,312
  • 5
  • 31
  • 50
0

If the extended inserts are still not fast enough try using load data infile. It won't give you progress but it will load your data the fastest that I'm aware of.

edit Here's a link to the mysql man page http://dev.mysql.com/doc/refman/5.1/en/load-data.html

RumpRanger
  • 654
  • 4
  • 9
  • Thank you for answer i will probably use that for now.But still i want to see progrress. –  Mar 07 '12 at 13:19