7

I have 2 large SQL files of around 8GB each. But, in the latest backup I find that one file has 300MB data missing.

I just want to compare which data is missing, so that I can check that was it just temporary data OR important data that has vanished.

On comparing both files via diff on Ubuntu 14.04 I always get memory allocation error. I have also tried other allowing more than memory solutions and all that, but still no help.

I want to gather all data which exists in sql1 but missing in sql2 to a new file sql3.

Please help!

EDIT: I moved from Simple MySQl-Server to Percona XtraDB Cluster recently, A lot of tables were converted from MyISAM to INNODB in the process. So, can that be a reason for 300MB decreases in mysqldump SQL files? I seriously doubt this because a SQL will be an SQL, but does INNODB SQL code is decreased in any case? An expert advice on this will help.

Rehmat
  • 2,121
  • 2
  • 24
  • 28
  • is it from one table or multiple tables – sumit Mar 07 '17 at 04:51
  • @sumit Its ALL databases. But, I think it doesn't matter, I just want to see which lines are missing from second sql file but exist in first sql file. – Rehmat Mar 07 '17 at 04:52
  • Red-Gate SQL Compare is a great tool that will do this for you. http://www.red-gate.com/products/mysql/mysql-compare/ – Ave Mar 07 '17 at 04:56

2 Answers2

2

SQL dumps comparison is quite hard to do when dealing with large amounts of data. I would try the following:

  1. Import each SQL file data into its own database
  2. Use one of the methods indicated here to compare database content (I assume the schema is the same). E.g. Toad for MySql

This way of comparison should be faster, as data manipulation is much faster when stored into database and also has the advantage the missing data can easily used. E.g.

SELECT *
FROM db1.sometable
WHERE NOT EXISTS (SELECT 1 
                  FROM db2.sometable 
                  WHERE db1.sometable.pkcol = db2.sometable.pk2)

will return the exact missing information into a convenient way.

Community
  • 1
  • 1
Alexei - check Codidact
  • 22,016
  • 16
  • 145
  • 164
  • Your solution suggests diff which I have tried already and I am getting memory allocation errors because the files are very large and I do not have enough memory. I have 153 databases, I cannot compare 1 by 1. – Rehmat Mar 07 '17 at 04:55
  • @rsharpy - no, skip the first answer. Try [this](http://stackoverflow.com/a/225790/2780791) or other from below. As mentined, the first should be to get data from each SQL in its own database and perform the comparison at database level. Dump data is harder to manipulate. – Alexei - check Codidact Mar 07 '17 at 04:56
  • @Alexei Thanks friend. If I am going to first import both SQL files into servers and then compare each database with other version, it will take me a lot of time. But, if I just get to know a quick line per line difference report from the 2 sql files, I will quickly understand what is missed. – Rehmat Mar 07 '17 at 05:00
  • Toad for MySql works on Windows and I have Mac and Linux only. – Rehmat Mar 07 '17 at 05:11
  • @rsharpy - you may take a look upon [mysqldbcompare]( https://dev.mysql.com/doc/mysql-utilities/1.5/en/mysqldbcompare.html). – Alexei - check Codidact Mar 07 '17 at 05:19
0

If you export the dump you can use tools like Beyond Compare, Semantic Merge, Winmerge, Code Compare or other diff tools.

Not that some tools (i.e. Beyond Compare) have 4096 characters limit for a row, which becomes a problem in the comparison (I got mad). It's possible to change that in Tools->FileFormat->[choose your format, maybe it is EverythingElse]->Conversion->64000 characters Per Line (this is the maximum).

Also you can try changing the fileformat to SQL(might not help much though; and it will slow your comparison).

enter image description here

Revious
  • 7,816
  • 31
  • 98
  • 147