Splitting a large sql file on statement boundaries?

Question

I need to import a large SQL table from an Oracle database to a mySQL database. I "dumped" the Oracle table using RazorSQL so that I now have a large (2.5 GB) .sql file, with many INSERT statements.

This file is too big and I want to split it so that I import it in chunks. What would be a clever way to split such a file?

using the split command would work except some string values contains newlines characters so that there is a high risk of split splitting mid-statement.
a Perl quickie using a regexp to split on ";\n" (end of statement) would work, but there is still the remote possibility that such a character sequence could be found in the data.

Of course I can always write a parser to understand string quoting. I am wondering though whether there is anything smarter?

Similar questions have been asked. eg:

How do I split the output from mysqldump into smaller files?

But the only splitting solution has been a suggestion for some Windows software. Windows is not an option for me. I run Unix, namely Mac OS X.

To give some background, the reason I want to do that is that I have an error when importing the file into mySQL. Localizing the error is painful if only because it's time consuming, and because opening such large text files for editing is non trivial at best with most text editors. Some of my SQL tools don't even report the location of the error in the file, which would strongly suggest a dichotomy approach to zeroing on to the error by splitting number of times. Automatizing this would be useful.

Thanks for any idea.

If you can use a tool mysqlimport and avoid using INSERT statements things will go much quicker, the parser overhead is not insignificant. Also make sure you have indexes disabled on your tables until the data is loaded. The difference can be up to a couple orders of magnitude in import time. Use **split**, it can produce the files you are interested in, if the problem you're concerned about happens deal with it then. If you get terrible error output from the import process you can do a binary search into the troublesome file to locate the error. — kevpie, Dec 14 '10 at 11:03
As I wrote, `split` is not OK: it will split the file on line boundaries, not on statement boundaries. I *do* have statements that run over several lines. In order to binary-search, I need a reliable splitting solution. Simple text-based splits are not OK. Maybe a regexp could be OK, but the regexp would have to be single-quote string aware. — Jean-Denis Muys, Dec 14 '10 at 15:03
It depends ;-). If finding an issue by binary search, two files of roughly half size each. Or I might prefer to go for a max size and variable number of files (eg as many files as necessary, but no larger than such and such size). Why would it matter? — Jean-Denis Muys, Dec 16 '10 at 01:40

score 0 · Answer 1 · answered Dec 14 '10 at 10:23

0

Not exactly an answer to your question, but maybe a solution to your problem:

Install Navicat (MySQL GUI). It has an oracle=>mysql migration tool, that might help you, because, most likely, errors you get come from syntax differences. It will be a hard task to convert one SQL dialect to another by hand.

answered Dec 14 '10 at 10:23

Silver Light

44,202
36
123
164

Not really. The only syntax difference I found was the `to_date` Oracle function. I solved that one by defining my own personal mySQL `to_date` function. I would in most cases expect any syntax issue to show up for all the lines, not one isolated line in the middle of that big file. – Jean-Denis Muys Dec 14 '10 at 14:58

Splitting a large sql file on statement boundaries?

1 Answers1