24

I have a situation where I have to update a web site on a shared hosting provider. The site has a CMS. Uploading the CMS's files is pretty straightforward using FTP.

I also have to import a big (relative to the confines of a PHP script) database file (Around 2-3 MB uncompressed). Mysql is closed for access from the outside, so I have to upload a file using FTP, and start a PHP script to import it. Sadly, I do not have access to the mysql command line function so I have to parse and query it using native PHP. I also can't use LOAD DATA INFILE. I also can't use any kind of interactive front-end like phpMyAdmin, it needs to run in an automated fashion. I also can't use mysqli_multi_query().

Does anybody know or have a already coded, simple solution that reliably splits such a file into single queries (there could be multi-line statements) and runs the query. I would like to avoid to start fiddling with it myself due to the many gotchas that I'm likely to come across (How to detect whether a field delimiter is part of the data; how to deal with line breaks in memo fields; and so on). There must be a ready made solution for this.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • 2
    Also, can you provide some test data? – Alix Axel Jan 06 '10 at 07:19
  • Cheers folks for all the great input. I will need to find myself the time to go through them and test them before the bounty runs out. :) – Pekka Jan 07 '10 at 21:01
  • Providing Test data is difficult right now, but in general, it's all sorts of tables with all sorts of nastiness (lots of line breaks, HTML code, binary data, everything). – Pekka Jan 07 '10 at 21:02
  • And one more thing, I have seen that there are some suggestions for putting the data into a pre-parsed format other than raw mySQL queries. While this probably makes sense, I am very hesitant to go that direction because the output end is already nicely set up using `mysqldump`. The solutions working on splitting a dump file into single queries look the most promising to me right now. – Pekka Jan 07 '10 at 21:03
  • As a side comment, a 2-3MB database file is by no means *big*. Large databases are commonly in the range of GB and even TB. – Corey Ballou Jan 08 '10 at 18:20
  • In this context, I mean "big" within the memory confines of a PHP script - those are usually around 16 to 32 MB. – Pekka Jan 08 '10 at 18:24
  • Pekka: Please see my solution below. There is no need to try to split the file, a good export and simple stream processing takes care of it. – hobodave Jan 09 '10 at 10:41
  • @Pekka: Please see my edited solution and comments below. – hobodave Jan 11 '10 at 01:11
  • wt abt this:DELIMITER // -- This multi-line statement contains a semicolon -- but not as the statement terminator. CREATE PROCEDURE simpleproc (OUT param1 INT) BEGIN SELECT COUNT(*) INTO param1 FROM foo; END – Angelin Nadar Sep 04 '12 at 12:54

13 Answers13

52

Here is a memory-friendly function that should be able to split a big file in individual queries without needing to open the whole file at once:

function SplitSQL($file, $delimiter = ';')
{
    set_time_limit(0);

    if (is_file($file) === true)
    {
        $file = fopen($file, 'r');

        if (is_resource($file) === true)
        {
            $query = array();

            while (feof($file) === false)
            {
                $query[] = fgets($file);

                if (preg_match('~' . preg_quote($delimiter, '~') . '\s*$~iS', end($query)) === 1)
                {
                    $query = trim(implode('', $query));

                    if (mysql_query($query) === false)
                    {
                        echo '<h3>ERROR: ' . $query . '</h3>' . "\n";
                    }

                    else
                    {
                        echo '<h3>SUCCESS: ' . $query . '</h3>' . "\n";
                    }

                    while (ob_get_level() > 0)
                    {
                        ob_end_flush();
                    }

                    flush();
                }

                if (is_string($query) === true)
                {
                    $query = array();
                }
            }

            return fclose($file);
        }
    }

    return false;
}

I tested it on a big phpMyAdmin SQL dump and it worked just fine.


Some test data:

CREATE TABLE IF NOT EXISTS "test" (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "name" TEXT,
    "description" TEXT
);

BEGIN;
    INSERT INTO "test" ("name", "description")
    VALUES (";;;", "something for you mind; body; soul");
COMMIT;

UPDATE "test"
    SET "name" = "; "
    WHERE "id" = 1;

And the respective output:

SUCCESS: CREATE TABLE IF NOT EXISTS "test" ( "id" INTEGER PRIMARY KEY AUTOINCREMENT, "name" TEXT, "description" TEXT );
SUCCESS: BEGIN;
SUCCESS: INSERT INTO "test" ("name", "description") VALUES (";;;", "something for you mind; body; soul");
SUCCESS: COMMIT;
SUCCESS: UPDATE "test" SET "name" = "; " WHERE "id" = 1;
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • No problem Pekka, glad I could help. – Alix Axel Jan 10 '10 at 18:30
  • 3
    May I ask what this line means? `if (preg_match('~' . preg_quote($delimiter, '~') . '\s*$~iS', end($query)) === 1)` – lulalala Oct 07 '11 at 02:35
  • 2
    @lulalala: It means: "*match the (escaped) $delimiter character in a case-insensitive way, followed by any number (0 to ∞) of spaces (or newlines, tabs, etc) right before the end of the line*". – Alix Axel Oct 08 '11 at 23:25
  • @genesis: Right, but if you've a couple of queries on a single line they probably won't (or at least shouldn't) be that huge to make you run out of memory. – Alix Axel Oct 25 '11 at 18:38
  • 1
    @AlixAxel: Do you think you can rewrite it to support one-line queries only? I'm searching for alternative ways. – genesis Oct 25 '11 at 18:44
  • @genesis: How huge are your lines? – Alix Axel Oct 25 '11 at 19:20
  • I'm not importing big files, I'm just creating something like PHPMyAdmin, smaller and faster than PHPMyAdmin, but I have no idea how to split queries universally – genesis Oct 25 '11 at 20:43
  • @genesis: An SQL query ending in a newline preceded by a delimiter is a small compromise to make, which in turn translates to a much more simple solution. Writing something that is smart enough to detect queries with delimiters (or no delimiters at all!) in any position probably requires an SQL lexis and will be way more expensive computationally. The reason why I asked how big are your queries is because you *might* be able to run several queries at once, and thus there is no need to separate them. Check `mysqli_multi_query()` as @Pekka mentioned, perhaps it's helpful to your problem. – Alix Axel Oct 25 '11 at 23:02
  • cool, but what's with the flush and the ob_ stuff? Why are they needed? – Alex Dec 04 '11 at 12:53
  • @Alex: They aren't critical, but allow you to see the import process as it happens. – Alix Axel Dec 04 '11 at 13:14
  • This seems to mess up whitespace inside query constants. – Kendall Hopkins Jan 24 '12 at 00:35
  • @KendallHopkins: Can you provide an example? – Alix Axel Jan 24 '12 at 00:47
  • I changed the `mysql_query` to a print and compared the outputs using `diff`, and found that this script ate quite a bit of the content's whitespace. http://pastebin.com/YQBHC13b – Kendall Hopkins Jan 24 '12 at 00:58
  • @KendallHopkins: Can you also paste the generated output? – Alix Axel Jan 24 '12 at 02:50
  • Thanks it helped a lot, but I need one change.. I don't want any output, I commented both echo lines though it gives me space/breakline in output ? – Darshit Gajjar Dec 05 '13 at 07:44
  • 1
    **Important:** this does not respect commented out lines/blocks – Jay Wick Apr 12 '14 at 04:52
  • This doesn't work on routine definitions. Doing a simple expode on ; would have done the same thing. – Dean Or Jul 28 '19 at 00:51
6

Single page PHPMyAdmin - Adminer - Just one PHP script file. check : http://www.adminer.org/en/

Zaje
  • 2,281
  • 5
  • 27
  • 39
3

When StackOverflow released their monthly data dump in XML format, I wrote PHP scripts to load it into a MySQL database. I imported about 2.2 gigabytes of XML in a few minutes.

My technique is to prepare() an INSERT statement with parameter placeholders for the column values. Then use XMLReader to loop over the XML elements and execute() my prepared query, plugging in values for the parameters. I chose XMLReader because it's a streaming XML reader; it reads the XML input incrementally instead of requiring to load the whole file into memory.

You could also read a CSV file one line at a time with fgetcsv().

If you're inporting into InnoDB tables, I recommend starting and committing transactions explicitly, to reduce the overhead of autocommit. I commit every 1000 rows, but this is arbitrary.

I'm not going to post the code here (because of StackOverflow's licensing policy), but in pseudocode:

connect to database
open data file
PREPARE parameterizes INSERT statement
begin first transaction
loop, reading lines from data file: {
    parse line into individual fields
    EXECUTE prepared query, passing data fields as parameters
    if ++counter % 1000 == 0,
        commit transaction and begin new transaction
}
commit final transaction

Writing this code in PHP is not rocket science, and it runs pretty quickly when one uses prepared statements and explicit transactions. Those features are not available in the outdated mysql PHP extension, but you can use them if you use mysqli or PDO_MySQL.

I also added convenient stuff like error checking, progress reporting, and support for default values when the data file doesn't include one of the fields.

I wrote my code in an abstract PHP class that I subclass for each table I need to load. Each subclass declares the columns it wants to load, and maps them to fields in the XML data file by name (or by position if the data file is CSV).

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • It's a nice technique indeed, but this doesn't provide a solution to split individual queries, which IMO is the hardest problem. – Alix Axel Jan 07 '10 at 18:48
  • I don't think it's practical to parse a SQL script, because there are too many edge cases. I recommend preparing the data dump as *data only*, using XML or CSV or some other format that you can parse easily in PHP. – Bill Karwin Jan 07 '10 at 19:52
  • I agree with you Bill, but that doesn't seem to be a solution for Pekka (at least that's what I understand from his question). – Alix Axel Jan 08 '10 at 01:04
  • I got a downvote. If you downvote on StackOverflow, please offer a comment to explain why. – Bill Karwin Jan 09 '10 at 15:49
  • 1
    Thanks Bill. As I amended to the question, the export phase is already pretty much wrapped up using `mysqldump`, so while it is probably generally the better way to use an export format as you describe, my requirement in this question is to import actual SQL queries. – Pekka Jan 10 '10 at 13:09
1

Can't you install phpMyAdmin, gzip the file (which should make it much smaller) and import it using phpMyAdmin?

EDIT: Well, if you can't use phpMyAdmin, you can use the code from phpMyAdmin. I'm not sure about this particular part, but it's generaly nicely structured.

Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
  • Nope, I need an automated solution to run every day. – Pekka Dec 10 '09 at 18:59
  • 1
    Here the code from phpMyAdmin: http://phpmyadmin.svn.sourceforge.net/viewvc/phpmyadmin/trunk/phpMyAdmin/libraries/import/sql.php?revision=13166&view=markup - earlier versions were easier to read, but you can see here how it works. – Frunsi Jan 07 '10 at 15:02
1

Export

The first step is getting the input in a sane format for parsing when you export it. From your question it appears that you have control over the exporting of this data, but not the importing.

~: mysqldump test --opt --skip-extended-insert | grep -v '^--' | grep . > test.sql

This dumps the test database excluding all comment lines and blank lines into test.sql. It also disables extended inserts, meaning there is one INSERT statement per line. This will help limit the memory usage during the import, but at a cost of import speed.

Import

The import script is as simple as this:

<?php

$mysqli = new mysqli('localhost', 'hobodave', 'p4ssw3rd', 'test');
$handle = fopen('test.sql', 'rb');
if ($handle) {
    while (!feof($handle)) {
        // This assumes you don't have a row that is > 1MB (1000000)
        // which is unlikely given the size of your DB
        // Note that it has a DIRECT effect on your scripts memory
        // usage.
        $buffer = stream_get_line($handle, 1000000, ";\n");
        $mysqli->query($buffer);
    }
}
echo "Peak MB: ",memory_get_peak_usage(true)/1024/1024;

This will utilize an absurdly low amount of memory as shown below:

daves-macbookpro:~ hobodave$ du -hs test.sql 
 15M    test.sql
daves-macbookpro:~ hobodave$ time php import.php 
Peak MB: 1.75
real    2m55.619s
user    0m4.998s
sys 0m4.588s

What that says is you processed a 15MB mysqldump with a peak RAM usage of 1.75 MB in just under 3 minutes.

Alternate Export

If you have a high enough memory_limit and this is too slow, you can try this using the following export:

~: mysqldump test --opt | grep -v '^--' | grep . > test.sql

This will allow extended inserts, which insert multiple rows in a single query. Here are the statistics for the same datbase:

daves-macbookpro:~ hobodave$ du -hs test.sql 
 11M    test.sql
daves-macbookpro:~ hobodave$ time php import.php 
Peak MB: 3.75
real    0m23.878s
user    0m0.110s
sys 0m0.101s

Notice that it uses over 2x the RAM at 3.75 MB, but takes about 1/6th as long. I suggest trying both methods and seeing which suits your needs.

Edit:

I was unable to get a newline to appear literally in any mysqldump output using any of CHAR, VARCHAR, BINARY, VARBINARY, and BLOB field types. If you do have BLOB/BINARY fields though then please use the following just in case:

~: mysqldump5 test --hex-blob --opt | grep -v '^--' | grep . > test.sql
hobodave
  • 28,925
  • 4
  • 72
  • 77
  • Cheers Hobodave. I tried your solution first and it basically worked, but it dropped a number of records from a number of tables. On cursory inspection, this was because those records contained actual line breaks. While this is probably easy to fix, the bounty's running out and I feel compelled out of fairness to pick the solution that worked for me out of the box, which in this case was Axel's. Thanks for your time, and if you want to change your answer to take line-breakey content into account, I'll be happy to test run it for you (I can't dump the SQL because it contains confidential info). – Pekka Jan 10 '10 at 13:07
  • @Pekka: what field type had a linebreak in it? I tried using TEXT and VARCHAR columns and my dump looks like: `INSERT INTO newline VALUES (1,'Four score, \nand seven years\nago');` – hobodave Jan 11 '10 at 00:54
  • I can't reproduce it with a BLOB field either. – hobodave Jan 11 '10 at 01:03
  • That's odd. I'll take a look at the imported data, the record number it stops at is always the same. – Pekka Jan 11 '10 at 08:57
0

Splitting a query cannot be reliably done without parsing. Here is valid SQL that would be impossible to split correctly with a regular expression.

SELECT ";"; SELECT ";\"; a;";
SELECT ";
    abc";

I wrote a small SqlFormatter class in PHP that includes a query tokenizer. I added a splitQuery method to it that splits all queries (including the above example) reliably.

https://github.com/jdorn/sql-formatter/blob/master/SqlFormatter.php

You can remove the format and highlight methods if you don't need them.

One downside is that it requires the whole sql string to be in memory, which could be a problem if you're working with huge sql files. I'm sure with a little bit of tinkering, you could make the getNextToken method work on a file pointer instead.

0

First at all thanks for this topic. This saved a lot of time for me :) And let me to make little fix for your code. Sometimes if TRIGGERS or PROCEDURES is in dump file, it is not enough to examine the ; delimiters. In this case may be DELIMITER [something] in sql code, to say that the statement will not end with ; but [something]. For example a section in xxx.sql:

    DELIMITER //
    CREATE TRIGGER `mytrigger` BEFORE INSERT ON `mytable`
    FOR EACH ROW BEGIN
         SET NEW.`create_time` = NOW();
    END
    //
    DELIMITER ;

So first need to have a falg, to detect, that query does not ends with ; And delete the unqanted query chunks, because the mysql_query does not need delimiter (the delimiter is the end of string) so mysql_query need someting like this:

    CREATE TRIGGER `mytrigger` BEFORE INSERT ON `mytable`
    FOR EACH ROW BEGIN
         SET NEW.`create_time` = NOW();
    END;

So a little work and here is the fixed code:

    function SplitSQL($file, $delimiter = ';')
    {
        set_time_limit(0);            
        $matches = array();
        $otherDelimiter = false;
        if (is_file($file) === true) {
            $file = fopen($file, 'r');
            if (is_resource($file) === true) {
                $query = array();
                while (feof($file) === false) {
                    $query[] = fgets($file);
                    if (preg_match('~' . preg_quote('delimiter', '~') . '\s*([^\s]+)$~iS', end($query), $matches) === 1){     
                        //DELIMITER DIRECTIVE DETECTED
                        array_pop($query); //WE DON'T NEED THIS LINE IN SQL QUERY
                        if( $otherDelimiter = ( $matches[1] != $delimiter )){
                        }else{
                            //THIS IS THE DEFAULT DELIMITER, DELETE THE LINE BEFORE THE LAST (THAT SHOULD BE THE NOT DEFAULT DELIMITER) AND WE SHOULD CLOSE THE STATEMENT                                
                            array_pop($query);
                            $query[]=$delimiter;
                        }                                                                                    
                    }                        
                    if ( !$otherDelimiter && preg_match('~' . preg_quote($delimiter, '~') . '\s*$~iS', end($query)) === 1) {                            
                        $query = trim(implode('', $query));
                        if (mysql_query($query) === false){
                            echo '<h3>ERROR: ' . $query . '</h3>' . "\n";
                        }else{
                            echo '<h3>SUCCESS: ' . $query . '</h3>' . "\n";
                        }
                        while (ob_get_level() > 0){
                            ob_end_flush();
                        }
                        flush();                        
                    }
                    if (is_string($query) === true) {
                        $query = array();
                    }
                }                    
                return fclose($file);
            }
        }
        return false;
}

I hope i could help somebody too. Have a nice day!

Kovge
  • 2,019
  • 1
  • 14
  • 13
0

http://www.ozerov.de/bigdump/ was very useful for me in importing 200+ MB sql file.

Note: SQL file should be already present in the server so that the process can be completed without any issue

Frederic Anand
  • 371
  • 4
  • 7
0

Can you use LOAD DATA INFILE?

If you format your db dump file using SELECT INTO OUTFILE, this should be exactly what you need. No reason to have PHP parse anything.

mluebke
  • 8,588
  • 7
  • 35
  • 31
  • I *think* LOAD DATA INFILE is closed for my mySQL user in this case but will check. – Pekka Dec 10 '09 at 18:59
  • I'm upvoting this to even it out. The answer is technically correct, even if it didn't help me, and I don't see why it should be downvoted. – Pekka Jan 10 '10 at 13:12
0

Already answered: Loading .sql files from within PHP Also:

Community
  • 1
  • 1
BYK
  • 1,359
  • 3
  • 15
  • 37
  • Thanks for pointing out the duplicate, but I can't see a solution there that fits my needs. – Pekka Jan 04 '10 at 12:46
  • It suggests having a look at phpMyAdmin's code which makes perfect sense. – BYK Jan 04 '10 at 12:48
  • I have tried that once, but didn't get very far, as the code is quite complex. If it's the only way, I will work my way through it, but there must be some sort of stand alone script for this somewhere. – Pekka Jan 04 '10 at 12:49
  • I have added several links to the answer. I suggest reading them. – BYK Jan 04 '10 at 12:53
0

I ran into the same problem. I solved it using a regular expression:

function splitQueryText($query) {
    // the regex needs a trailing semicolon
    $query = trim($query);

    if (substr($query, -1) != ";")
        $query .= ";";

    // i spent 3 days figuring out this line
    preg_match_all("/(?>[^;']|(''|(?>'([^']|\\')*[^\\\]')))+;/ixU", $query, $matches, PREG_SET_ORDER);

    $querySplit = "";

    foreach ($matches as $match) {
        // get rid of the trailing semicolon
        $querySplit[] = substr($match[0], 0, -1);
    }

    return $querySplit;
}

$queryList = splitQueryText($inputText);

foreach ($queryList as $query) {
    $result = mysql_query($query);
}
calvinlough
  • 362
  • 3
  • 13
-1

You can use phpMyAdmin for importing the file. Even if it is huge, just use UploadDir configuration directory, upload it there and choose it from phpMyAdmin import page. Once file processing will be close to the PHP limits, phpMyAdmin interrupts importing, shows you again import page with predefined values indicating where to continue in the import.

Michal Čihař
  • 9,799
  • 6
  • 49
  • 87
-3

what do you think about:

system("cat xxx.sql | mysql -l username database"); 
opHASnoNAME
  • 20,224
  • 26
  • 98
  • 143
  • Can't do that - as I write in the question, I have no access to the command line. (The downvote is not mine, though). – Pekka Jan 04 '10 at 13:00
  • I forgot to post my comment : This is a shared host, you cannot use "system" function and a lot of "somehow dangerous" functions. – Arno Jan 04 '10 at 13:23