we have around 200 GB .sql file we are grepping for some tables it is taking around 1 and half hour, as there any method we ca reduce time? any other efficient method to filter for some tables ? any help will be appreciated
Asked
Active
Viewed 355 times
1
-
1Some random suggestions: a) Buy faster disks; b) If the file doesn't change often, gzip it; c) Also, if it doesn't change very often, split it into four parts, then gzip the parts; d) Buy more memory; e) Try [The Silver Searcher](https://github.com/ggreer/the_silver_searcher). (b) and (c) trade off CPU for IO and help you fit more of the file in OS caches. – Sinan Ünür Jun 28 '17 at 15:04
-
1Possible duplicate of [Fastest possible grep](https://stackoverflow.com/questions/9066609/fastest-possible-grep) – Benjamin W. Jun 28 '17 at 15:13
1 Answers
1
The GNU parallel program can split input into multiple child processes, each of which will run grep over each respective part of the input. By using multiple processes (presumably you have enough CPU cores to apply to this work), it can finish faster by running in parallel.
cat 200-gb-table.sql | parallel --pipe grep '<pattern>'
But if you need to know the context of where the pattern occurs (e.g. the line number of the input) this might not be what you need.

Bill Karwin
- 538,548
- 86
- 673
- 828