1

I want to split mysql queries from phpmyadmin export format. Each query will be separated by ";" but can't use explode because ";" could be there anywhere in field value too. Below is one example.

insert into wp_options (option_id, blog_id, option_name, option_value, autoload) values (201, 0, "tadv_btns2", "a:21:{i:0;s:14:\"fontsizeselect\";i:1;s:12:\"formatselect\";i:2;s:9:\"pastetext\";i:3;s:9:\"pasteword\";i:4;s:12:\"removeformat\";i:5;s:9:\"separator\";i:6;s:7:\"charmap\";i:7;s:5:\"print\";i:8;s:9:\"separator\";i:9;s:9:\"forecolor\";i:10;s:9:\"backcolor\";i:11;s:8:\"emotions\";i:12;s:9:\"separator\";i:13;s:3:\"sup\";i:14;s:3:\"sub\";i:15;s:5:\"media\";i:16;s:9:\"separator\";i:17;s:4:\"undo\";i:18;s:4:\"redo\";i:19;s:7:\"attribs\";i:20;s:7:\"wp_help\";}", "no");

insert into wp_options (option_id, blog_id, option_name, option_value, autoload) values (202, 0, "tadv_btns3", "a:0:{}", "no");

insert into wp_options (option_id, blog_id, option_name, option_value, autoload) values (203, 0, "tadv_btns4", "a:0:{}", "no");

insert into wp_options (option_id, blog_id, option_name, option_value, autoload) values (204, 0, "tadv_allbtns", "a:64:{i:0;s:6:\"wp_adv\";i:1;s:4:\"bold\";i:2;s:6:\"italic\";i:3;s:13:\"strikethrough\";i:4;s:9:\"underline\";i:5;s:7:\"bullist\";i:6;s:7:\"numlist\";i:7;s:7:\"outdent\";i:8;s:6:\"indent\";i:9;s:11:\"justifyleft\";i:10;s:13:\"justifycenter\";i:11;s:12:\"justifyright\";i:12;s:11:\"justifyfull\";i:13;s:3:\"cut\";i:14;s:4:\"copy\";i:15;s:5:\"paste\";i:16;s:4:\"link\";i:17;s:6:\"unlink\";i:18;s:5:\"image\";i:19;s:7:\"wp_more\";i:20;s:7:\"wp_page\";i:21;s:6:\"search\";i:22;s:7:\"replace\";i:23;s:10:\"fontselect\";i:24;s:14:\"fontsizeselect\";i:25;s:7:\"wp_help\";i:26;s:10:\"fullscreen\";i:27;s:11:\"styleselect\";i:28;s:12:\"formatselect\";i:29;s:9:\"forecolor\";i:30;s:9:\"backcolor\";i:31;s:9:\"pastetext\";i:32;s:9:\"pasteword\";i:33;s:12:\"removeformat\";i:34;s:7:\"cleanup\";i:35;s:12:\"spellchecker\";i:36;s:7:\"charmap\";i:37;s:5:\"print\";i:38;s:4:\"undo\";i:39;s:4:\"redo\";i:40;s:13:\"tablecontrols\";i:41;s:4:\"cite\";i:42;s:3:\"ins\";i:43;s:3:\"del\";i:44;s:4:\"abbr\";i:45;s:7:\"acronym\";i:46;s:7:\"attribs\";i:47;s:5:\"layer\";i:48;s:5:\"advhr\";i:49;s:4:\"code\";i:50;s:11:\"visualchars\";i:51;s:11:\"nonbreaking\";i:52;s:3:\"sub\";i:53;s:3:\"sup\";i:54;s:9:\"visualaid\";i:55;s:10:\"insertdate\";i:56;s:10:\"inserttime\";i:57;s:6:\"anchor\";i:58;s:10:\"styleprops\";i:59;s:8:\"emotions\";i:60;s:5:\"media\";i:61;s:10:\"blockquote\";i:62;s:9:\"separator\";i:63;s:1:\"|\";}", "no");
BenMorel
  • 34,448
  • 50
  • 182
  • 322
Mihir
  • 1,156
  • 3
  • 13
  • 22
  • Note for newer PHP versions: mysqli supports multiple statemens in one string with the function mysqli_multi_query() -> http://php.net/manual/de/mysqli.quickstart.multiple-statement.php – BurninLeo Jan 21 '17 at 19:50

6 Answers6

8

Given a string containing multiple, semi-colon separated SQL statements, the following function parses out each individual statement and returns them all in an array. It uses one (non-trivial) regex, and one call to preg_match_all(), and correctly handles single and multi-line comments, and single and double-quoted strings, (each of which may contain non-terminating semi-colons to be ignored):

function split_sql($sql_text) {
    // Return array of ; terminated SQL statements in $sql_text.
    $re_split_sql = '%(?#!php/x re_split_sql Rev:20170816_0600)
        # Match an SQL record ending with ";"
        \s*                                     # Discard leading whitespace.
        (                                       # $1: Trimmed non-empty SQL record.
          (?:                                   # Group for content alternatives.
            \'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'  # Either a single quoted string,
          | "[^"\\\\]*(?:\\\\.[^"\\\\]*)*"      # or a double quoted string,
          | /\*[^*]*\*+(?:[^*/][^*]*\*+)*/      # or a multi-line comment,
          | \#.*                                # or a # single line comment,
          | --.*                                # or a -- single line comment,
          | [^"\';#]                            # or one non-["\';#-]
          )+                                    # One or more content alternatives
          (?:;|$)                               # Record end is a ; or string end.
        )                                       # End $1: Trimmed SQL record.
        %x';  // End $re_split_sql.
    if (preg_match_all($re_split_sql, $sql_text, $matches)) {
        return $matches[1];
    }
    return array();
}

Edit 2017-08-15: Fixed error in multiline comment portion of regex pointed out by @jxmallett. Edit 2017-08-16: Tidied regex (added regex shebang and removed unnecessary group $2).

ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • This is great, only wish it could separate comments into their own index instead of grouping them with the following sql statement. Not sure how to do that with multi-line comments and without something hacky like checking for keywords that are only valid at the start of a sql statement... :( – Marcus Pope Apr 03 '14 at 16:25
  • For now I'm just stripping out all comments with this answer: http://stackoverflow.com/questions/9690448/regular-expression-to-remove-comments-from-sql-statement and then using split_sql() to break out the result. – Marcus Pope Apr 03 '14 at 16:35
  • 1
    does not support PROCEDURES. – pronebird Jun 03 '15 at 13:05
  • and the same regexp for JS: `regex = '\s*((?:\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'|"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\#.*|\\/\\*[\\w\\W]*?(?=\\*\\/)\\*\\/|--.*|[^"\';#])+(?:;|$))';` – Ondřej Hlaváček Jul 09 '15 at 18:55
  • It matches this as one query when it's actually 2: `/* comment */ SELECT * FROM foo; /* comment 2 */ SELECT * FROM bar;` It seems to be a problem with multi-line comments. I used `'@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms'` to strip all the comments before running `split_sql`. – jxmallett Aug 16 '17 at 02:21
  • @jxmallett - Thanks for pointing out the erroneous behavior. Forgot to escape the first literal asterisk in the multiline comment regex fragment. All better now. *Thanks again!* – ridgerunner Aug 16 '17 at 03:33
  • I figured it would be something simple but my regex foo isn't quite as good as yours ;-) Thanks for the quick fix, @ridgerunner! – jxmallett Aug 16 '17 at 03:56
  • Found another bug in the regex. `SELECT '\n';` fails. – Ondřej Hlaváček Sep 04 '17 at 15:57
  • @Ondřej Hlaváček - My testing shows that `SELECT '\n'` is parsed out correctly. (I can't get it to fail.) – ridgerunner Sep 04 '17 at 23:24
  • @ridgerunner it may be an mistake on my end, I pasted the regex into https://regex101.com/ and I had to swap all `\\\\` occurences in the regex to `\\` to make it work – Ondřej Hlaváček Sep 07 '17 at 09:26
1

My suggestion would be to replace your delimiter with something that would be unique. For example, a semicolon won't be unique (as you described) so you can't split on this. Newlines (as was stated before) may be a good and simple option.

pseudoramble
  • 2,541
  • 22
  • 28
1

one way to do is :

explode by ");"

it will return array of statements

then concat ")" at the end of each statement in array

0

Are those lines separated by a "newline" character? If yes just explode them:

$my_array = explode("\n",$content);

Francesco Laurita
  • 23,434
  • 8
  • 55
  • 63
  • actually some queries are divided in more than one line so explode("\n",$content) won't work – Mihir Jan 20 '11 at 17:48
0

how about something like this

$fh = fopen('/path/to/your/file.sql', 'r') or die($php_errormsg);
while (!feof($fh)) {
    $line = fgets($fh, 4096);
    if (preg_match($pattern, $line)) { $sql_queries[ ] = $line; }
}
fclose($fh);

Where $pattern can be initialized to an sql insert query format. Using newline wont be a good option as large sql inserts stretch for more than one line

ayush
  • 14,350
  • 11
  • 53
  • 100
0

I modified "ridgerunner" function "split_sql" for some problems: no back quoted strings, was skipping queries with syntax errors, not all multi-line comments matches or last empty sql. It always return full array for debug each query or reassembling for profiling in MySQL;

function split_sql($sql, $rtrim = TRUE)
{
    $re = '%
        \s*                                                 # Discard leading whitespace.
            ((?:                                            # Group for content alternatives.
              [^;"\'#`\/-]+                                 # None of these [...]                  
            | \'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'?           # or a single quoted string,
            | "[^"\\\\]*(?:\\\\.[^"\\\\]*)*"?               # or a double quoted string,
            | `[^`]*`?                                      # single back quoted string
            | \/\*(?:.*?\*\/|.*)                            # or a multi-line comment,
            | (?-s)\#.*                                     # or a # single line comment (off s-modifier from here),
            | --.*                                          # or a -- single line comment,
            | -                                             # or -
            )*)                                             # One or more content alternatives
            ;                                               # Record end is a ; or string end.
        %xs';
    if ($rtrim)
        $sql = rtrim($sql, "; \t\n\r\0\x0B");
    if (preg_match_all($re, $sql.';', $matches))
        return $matches[1];        
    return [];
}