How to deal with enormous line lengths created by mysqldump

Question

I'm using mysqldump in a cron job to backup a database with over 2 million rows.

It creates a text file which can be used to restore the datalog from the command line.

I thought it would be useful to edit the dump before a restore as a quick way of changing values and table or column names - at least until I learn more and become confident about doing it with ALTER and UPDATE.

Editing large text files does not bother me, but I was surprised to find that in a 250 megabyte dump of my database, there were only about 300 lines. Each line was something like 800k characters long.

Is there another way of generating dumps with more control over line length?

Or should I post-process the dump with tools like sed or Perl?

Actually, I used Perl and said $line =~ s{\\).\\(}{),\n(}g; so I got LOTS of extra lines. — pavium, Aug 18 '09 at 13:29
That is to say, the ; was already my eol at the end of each INSERT into table_name VALUES (..),(..),(..) .. (..); 282 lines in the entire file, 252 ending with ; I chose to insert newlines after the commas — pavium, Aug 18 '09 at 13:53
You're sure your actual data had no parenthesis-comma-parenthesis sequences in it? — Ken Williams, Dec 22 '12 at 00:22
This is a known shortcoming in `mysqldump`. It was [first reported in 2004](https://bugs.mysql.com/bug.php?id=4328). In 2011, both [Tim Riker](https://bugs.mysql.com/bug.php?id=4328) & [Lon Binder](https://stackoverflow.com/a/20046484) suggested a 1-line patch to fix it. Mindbogglingly, this *still* has not been implemented by the `mysqldump` developers/maintainers. Because the original bug report was closed (wrongly and unhelpfully), the issue is now being tracked [here](https://bugs.mysql.com/bug.php?id=65465). — , Jun 03 '18 at 15:59

score 82 · Accepted Answer · answered Aug 18 '09 at 12:42

82

By default, mysqldump generates only one INSERT command per table, resulting in one (very long) line of inserted data for each table that got dumped. This is essentially because the "batch" inserts are much faster than if it generated a separate INSERT query for every record in every table.

So, it's not that mysqldump has created arbitrarily long lines, and you can just impose some other cutoff length. The lines are long for a reason.

If it's really important to get the INSERTs broken down onto multiple lines, you can indicate that with:

mysqldump --extended-insert=FALSE --complete-insert=TRUE ...

Note, however, that restoring tables will take longer in this format.

answered Aug 18 '09 at 12:42

VoteyDisciple

37,319
5
97
97

I had looked in the mysqldump man page, and tried --extended-insert=FALSE but --complete-insert=TRUE is new to me. I'll try it, thanks. – pavium Aug 18 '09 at 13:11
This really helped me, thank you. The extra long lines produced by MySQLDump were actually breaking and were not working right during the restore... switching extended inserts off fixed the issue. And since I zip the resulting archive, the backup file isn't that big. – Alex Weinstein Jan 21 '11 at 05:11
We do a lot of legacy Content Management System migration and the --extended-insert and --complete-insert recommendations are quite valuable. Thank you. – Dave Aiello Aug 29 '12 at 23:35
2

I'm wondering if anyone can confirm that the net_buffer_length option will also work well. It seems like a better solution to me. – Buttle Butkus Dec 01 '12 at 00:22
36

@VoteyDisciple - that explains why it's one big `INSERT`, but not why it's one big line. There could easily be a newline after each insertion row, right? – Ken Williams Dec 22 '12 at 00:26
That's true: mysqldump could hypothetically add an option to write a single `INSERT` statement but with each of the `VALUES` lists separated by newlines. That wouldn't be a *bad* option, but neither would it probably be in high demand. When your goal is just to produce a backup, extra newlines are just unnecessary bytes. And when your goal is to manually manipulate the file, fully explicit statements (while verbose) are nice. I can't speak to why mysqldump's authors did not include a particular option, but personally I don't don't find it lacking. – VoteyDisciple Dec 24 '12 at 17:09
9

Oh, there sure IS a demand for that! We use mysqldump to store DB for each "tag". We make some smaller updates between tags and with one big line you can't see which content has changed and you need to scroll sideways to it. – Tomáš Fejfar Mar 22 '13 at 13:08
Using the extended INSERT format will resolve that. You'll get separate `INSERT` statements on each line, so the only sideways scrolling you'd do is within a particular statement. – VoteyDisciple Mar 28 '13 at 18:43
18

In my eyes it is just a "formatting issue". Why not generate ONE INSERT statement but the value blocks for each row in separate lines. This would make the generated file human readable and does not effect performance at all. – hfrmobile Jul 08 '13 at 08:04
1

http://stackoverflow.com/questions/15750535/using-mysqldump-to-format-one-insert-per-line – hfrmobile Jul 08 '13 at 08:11
2

PHPMyAdmin generates its dump files with a new line between each insertion. I guess there was demand for that. – Tamlyn Mar 27 '14 at 10:37
I'm sure in your text editor you could use a simple (regex) search and replace to format it how you wish. – LeonardChallis Aug 13 '14 at 12:34

score 35 · Answer 2 · answered Dec 15 '10 at 23:04

35

I was browsing the MySQL source code looking for a solution to this problem today. The maximum line length is enforced by the variable opt_net_buffer_length which is supposed to match the MySQL server's buffer size. It is comically large.

But anyway, it's an option, so just do this:

mysqldump --net_buffer_length=5000 ...

The minimum value is 4096.

answered Dec 15 '10 at 23:04

superjer

704
5
5

8

Will this cause problems with individual rows containing more than 4K data? – Vladimir Panteleev Oct 08 '12 at 19:50
@CyberShadow good question. I bet it would but I have no idea. Next time I do this, I'll try matching up the mysqldump net_buffer_length to the max_allowed_packet and see if that works. I just did an import where 16M max_allowed_packet was exceeded by query size. Even 64M was too small - went to 256M and got it through. So I would probably try a net_buffer_length of around 10M next time, myself, and see what that does. – Buttle Butkus Dec 01 '12 at 00:20
by far the best answer, won't be too slow and won't break other DBMS like sqlite which has a default limit of 500 inserts per insert statement. thank you! – colin lamarre Jan 03 '15 at 04:19
1

@CyberShadow no it doesn't cause problems with bigger rows - they take as much space as they need. Tested with MariaDB 10.0.21. – Michael Platings Nov 27 '15 at 15:55
1

Performance wise, it took to import my 20MB / 335k row database dump: [ with a single-transaction dump: 22s ] // [ with a 4096 buffer length dump: 1m 44s ] // [ with extended-insert=false: ???? I got bored waiting and CTRL-Ced it after 33 minutes ]. So this answer is without a doubt the best compromise between speed and diffability. Caveat emptor: My subsequent dump was identical but I didn't test with row data longer than 4096 chars. – Idris Jul 16 '17 at 14:08
1

*Actually, thinking about it, if a row early on in a table is modified or deleted, then all subsequent lines would have their content offset across the block boundaries, making subsequent dumps differ on every line after that point. So maybe this doesn't have great diffability after all... – Idris Jul 16 '17 at 14:28

score 22 · Answer 3 · answered Nov 18 '13 at 11:19

22

I came across an answer on the MySQL forums, which conclusively shows adding '\n' after each INSERT group is not possible using mysqldump alone, without modifying the source:

The extended format cannot be 100% properly parsed based on the comma or parenthesis, you would to count fields. The best solution, fix mysqldump to linebreak on output.

Very minor change: On line 3506, you can see where the row-ending comma is output:
fputc(',',md_result_file); /* Always row break */

Simply insert this line immediately after line 3506:
fputc('\n',md_result_file); /* Lon Binder says wrap that line! */

re-compile and done.

@see http://forums.mysql.com/read.php?28,420002,426110#msg-426110

Thanks Lon B!

(I've included the content from the MySQL forum just in case the forum disappears.)

answered Nov 18 '13 at 11:19

StampyCode

7,218
3
28
44

17

Ugh, why is this not an option on mysqldump? This is the best of both worlds. – SineSwiper Nov 22 '13 at 15:23
Please give link to source of mysqldump. – Maxim Kholyavkin Aug 27 '14 at 06:42
@Speakus - MySQL is owned by Oracle and offers no direct link to download or view individual source files - you can download the source bundle from here: http://dev.mysql.com/downloads/utilities/ - just select 'Source' as the platform. – StampyCode Aug 27 '14 at 07:40
thank you. minor improve for your comment: i should choose source not from your link, but from here: http://dev.mysql.com/downloads/mysql/ + open file mysql-5.6.20/client/mysqldump.c + search required line via comment (i have found line 3791) + add new line as described. another way to get source code: https://mariadb.com/kb/en/mariadb/documentation/getting-started/compiling-mariadb-from-source/source-getting-the-mariadb-source-code/ + change maria/client/mysqldump.c (line 3928). Unfortunately, both tools (oracle mysql and mariadb) don't have option to solve question out of box still – Maxim Kholyavkin Aug 27 '14 at 12:46
2022 now: Has such a "best of both worlds" option been added to the `mysqldump` tool meanwhile? – porg Dec 20 '22 at 00:16

score 6 · Answer 4 · answered Jun 13 '17 at 10:40

6

This flag is also worked:

mysqldump --skip-extended-insert

Just same as --extended-insert=FALSE.

answered Jun 13 '17 at 10:40

Nick Tsai

3,799
33
36

1

Yeah I like that one better for some reason :) Just needed this for a diff, where the normal long lines make diffs really painful. – Daniel Beardsmore Jul 13 '17 at 12:38
Every record starts with `INSERT INTO table_name VALUES (...);`, which takes up more space, but can lend some helpful context if it's not obvious from the data which table it's going into. – mwfearnley Apr 29 '19 at 08:46

score 5 · Answer 5 · answered Jun 20 '14 at 19:31

5

Using a regex to split lines is not enough, you need a parser that will properly understand quotes and escaping characters.

I just wrote a parser since I couldn’t find one: http://blog.lavoie.sl/2014/06/split-mysqldump-extended-inserts.html

answered Jun 20 '14 at 19:31

sebastien

418
4
7

Very useful for existing dumps :) – Chris Browet Aug 04 '20 at 07:34

score 1 · Answer 6 · answered Jul 17 '20 at 09:16

1

If you have already a SQL dump with very long lines and want to read it easily, you can use

cat dump.sql  | sed 's$VALUES ($VALUES\n($g' | sed 's$),($),\n($g' | less

cat dump.sql  | sed 's$VALUES ($VALUES\n($g' | sed 's$),($),\n($g' | grep something

answered Jul 17 '20 at 09:16

Alex

32,506
16
106
171

vbarbarosh · Answer 7 · 2023-03-18T20:04:54.107

This is how I solved this problem:

I ask mysqldump --skip-extended-insert to produce one INSERT INTO per line:

INSERT INTO `action_events` VALUES (1,'8cecd83d-76b7-44d3-ad6a-5ad3ff6410dd');
INSERT INTO `action_events` VALUES (2,'8cecd8ba-465c-4d2e-b9f0-caf504817e3a');
INSERT INTO `action_events` VALUES (3,'8cecd961-03d1-461f-8425-067c1d01d238');
INSERT INTO `action_events` VALUES (4,'8cf08433-f262-443c-90d1-629dcd11c5aa');
INSERT INTO `xx` VALUES (5,'8cf08433-f262-443c-90d1-629dcd11c5aa');
INSERT INTO `yy` VALUES (6,'8cf08433-f262-443c-90d1-629dcd11c5aa');

Using awk I join consecutive lines with INSERT INTO by commma:

INSERT INTO `action_events` VALUES
(1,'8cecd83d-76b7-44d3-ad6a-5ad3ff6410dd'),
(2,'8cecd8ba-465c-4d2e-b9f0-caf504817e3a'),
(3,'8cecd961-03d1-461f-8425-067c1d01d238'),
(4,'8cf08433-f262-443c-90d1-629dcd11c5aa');
INSERT INTO `xx` VALUES
(5,'8cf08433-f262-443c-90d1-629dcd11c5aa');
INSERT INTO `yy` VALUES
(6,'8cf08433-f262-443c-90d1-629dcd11c5aa');

Here is the script join-insert-into:

#!/bin/bash

# 1. print $0
# 2. print INSERT INTO
# 3. print (1,2,3),
# 4. print (1,2,3);
awk '
    BEGIN { FS = "\n"; OFS = ""; max_allowed_packet = 25*1024*1024; }
    END { if (prev_values) print prev_values ";" }
    {
        if (match($0, /^INSERT INTO [^ ]+ VALUES /)) {
            table = substr($0, RSTART, RLENGTH - 1)
            values = substr($0, RLENGTH + 1, length($0) - RLENGTH - 1)
            if (prev_table != table) {
                if (prev_values) {
                    print prev_values ";"
                }
                print table
                packet = length(table) + 1
            }
            else {
                packet += length(prev_values) + 2
                if (packet >= max_allowed_packet) {
                    print prev_values ";"
                    print table
                    packet = length(table) + 1
                }
                else {
                    print prev_values ","
                }
            }
            prev_table = table
            prev_values = values
        }
        else {
            if (prev_values) {
                print prev_values ";"
                prev_table = null
                prev_values = null
            }
            print $0
        }
    }
'

It can be used like that:

# https://stackoverflow.com/a/34670902/1478566
MYSQL_PWD="$DB_PASSWORD" mysqldump \
    --no-tablespaces --quick --skip-extended-insert \
    -h$DB_HOST --port=${DB_PORT:-3306} \
    -u$DB_USERNAME $DB_DATABASE "$@" | join-insert-into

score -4 · Answer 8 · answered Aug 18 '09 at 12:36

-4

Post-process the dump file with python. You may be happier than perl or sed.

If you're running on Linux, you already have it installed. If you're running on Windows, the installer is painless.

Before that, however, learn to use SQL UPDATE and SQL ALTER. You'll be happiest doing things the right way.

answered Aug 18 '09 at 12:36

S.Lott

384,516
81
508
779

I prefer Perl or sed, simply because that's what I'm used to. And although there's a certain satisfaction in editing a big file, I will learn the SQL commands, I promise. – pavium Aug 18 '09 at 13:29
9

Post-processing is a bad idea unless you're really certain you understand exactly how the data is escaped, and exactly how to write code to deal with it. Otherwise you're going to corrupt your data. – Ken Williams Dec 22 '12 at 00:20
3

If you're sure you won't corrupt the data, here's an example command to split the inserts `mysqldump -udbuser dbname | sed 's/),(/),\n(/g' > dump.sql` – Tamlyn Mar 27 '14 at 10:32
2

@Tamlyn But what if I have `),(` in a VARCHAR field? Maybe a comma delimited list of emoticons, or something. I guess, yeah... if you know your data well enough that there's no possibility that you have anything like that... – James S May 30 '14 at 20:38
It might be better to post a Python solution or a link to one, rather than just the Python homepage. Someone who hasn't heard of Python before will have a long journey ahead before they can write a suitable post-processing script. – mwfearnley May 03 '19 at 09:34
Someone who hasn't heard of Python shouldn't be copy-pasting from the Internet without the skills to review the code! :-P – user2066657 Feb 14 '22 at 07:50

How to deal with enormous line lengths created by mysqldump

8 Answers8

Linked