2

I have a MySQL dump file over 1 terabyte big. I need to extract the CREATE TABLE statements from it so I can provide the table definitions.

I purchased Hex Editor Neo but I'm kind of disappointed I did. I created a regex CREATE\s+TABLE(.|\s)*?(?=ENGINE=InnoDB) to extract the CREATE TABLE clause, and that seems to be working well testing in NotePad++.

However, the ETA of extracting all instances is over 3 hours, and I cannot even be sure that it is doing it correctly. I don't even know if those lines can be exported when done.

Is there a quick way I can do this on my Ubuntu box using grep or something?

UPDATE

Ran this overnight and output file came blank. I created a smaller subset of data and the procedure is still not working. It works in regex testers however, but grep is not liking it and yielding an empty output. Here is the command I'm running. I'd provide the sample but I don't want to breach confidentiality for my client. It's just a standard MySQL dump.

grep -oP "CREATE\s+TABLE(.|\s)+?(?=ENGINE=InnoDB)" test.txt > plates_schema.txt

UPDATE It seems to not match on new lines right after the CREATE\s+TABLE part.

tmn
  • 11,121
  • 15
  • 56
  • 112

4 Answers4

2

You can use the following:

grep -ioP "^CREATE\s+TABLE[\s\S]*?(?=ENGINE=InnoDB)" file.txt > output.txt
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • This looks too easy... I'll get off my Windows machine and head to the Linux box to test this. – tmn Jun 03 '15 at 17:34
  • I'll check on this in two hours. I will let you know if this cranks it out successfully. – tmn Jun 03 '15 at 17:47
2

You can use Perl for this task... this should be really fast.

Perl's .. (range) operator is stateful - it remembers state between evaluations. What it means is: if your definition of table starts with CREATE TABLE and ends with something like ENGINE=InnoDB DEFAULT CHARSET=utf8; then below will do what you want.

perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' INPUT_FILE.sql > OUTPUT_FILE.sql

EDIT:

Since you are working with a really large file and would probably like to know the progress, pv can give you this also:

pv INPUT_FILE.sql | perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' > OUTPUT_FILE.sql

This will show you progress bar, speed and ETA.

Michal Gasek
  • 6,173
  • 1
  • 18
  • 20
  • I'm all for fast. Having issues with pcregrep on the large file, saying the --buffer-size is too little. So i will give this a try next. – tmn Jun 05 '15 at 00:28
  • I ran the test on the small sample. Worked perfectly. Running it on the 1 TB data dump file now. If this avoids the performance issues of regular expressions, you will be my hero. – tmn Jun 05 '15 at 00:30
  • 1
    Well, whatever tool you use here, disk IO is really what takes lots of time, you need to read 1TB of data. hope you're on SSD :) – Michal Gasek Jun 05 '15 at 00:37
  • Something definitely went wrong here... After two hours the output file was 137 GB. I killed it and printed to the console, it was capturing the INSERTS as well – tmn Jun 05 '15 at 03:06
  • HA! Totally found the issue... not every ENGINE is InnoDb. That asumption on my part REALLY messed things up. – tmn Jun 05 '15 at 03:32
  • 1
    Just adjust the closing regex and you're good to go with this solution, if you have InnoDB & MyISAM tables then: `perl -ne 'print if /CREATE TABLE/../ENGINE=(InnoDB|MyISAM)/' INPUT_FILE.sql > OUTPUT_FILE.sql` – Michal Gasek Jun 05 '15 at 08:31
  • 1
    Yeah I ended up using `perl -ne 'print if /CREATE TABLE/../ENGINE=` and omitted the ENGINE argument altogether. It worked perfectly after that. – tmn Jun 05 '15 at 14:53
  • You can also use the `--no-data` flag in mysqldump command to easily achieve this: https://stackoverflow.com/a/6175506/1928149 – Prasannjeet Singh Oct 04 '22 at 13:34
1

If you can run mysqldump again, simply add --no-data.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Got it! grep does not support matching across multiple lines. I found this question helpul and I ended up using pcregrep instead.

pcregrep -M "CREATE\s+TABLE(.|\n|\s)+?(?=ENGINE=InnoDB)" test.txt > plates.schema.txt
Community
  • 1
  • 1
tmn
  • 11,121
  • 15
  • 56
  • 112
  • Unless there are comments with ';', go all the way to the next semicolon; there could be other important info, such as `DEFAULT CHARACTER SET`. – Rick James Jun 08 '15 at 20:04