PHP preg_match_all 100 MB file

Question

I have read that "preg_match_all" is not made for parsing large files, but I need to do that. I have increased:

pcre.backtrack_limit=1000000000
pcre.recursion_limit=1000000000

my PHP memory_limit is set to 5000M and script still ends without any error or exception within 0,2 sec...

Is the only solutinon split the 100M file into 100 small 1M files?

Thanks for help

What's your code? It should work fine on large files, but it'll be a huge memory hog. A hugeeee one. — Nathanael, Jul 03 '12 at 17:16
https://gist.github.com/3041137 its ugly.. I know... FOR, FOR, FOR ... — Marek Javůrek, Jul 03 '12 at 17:18
If it's failing really quickly, you probably just have a `parse error`. Turn on PHP error reporting. Also, when sharing your code, you should post it in the question above, and format it nicely. — Nathanael, Jul 03 '12 at 17:21
Note that you can't just increase `pcre.recursion_limit` - you also need to increase the stack size of the running executable (i.e. `php.exe` or `httpd.exe` on Win32 machines). See: my related answer to: [RegExp in preg_match function returning browser error](http://stackoverflow.com/a/7627962/433790) which explains why really bad things can happen with PHP/PCRE and "large" target strings, (and how you can avoid them). — ridgerunner, Jul 03 '12 at 18:40

score 4 · Answer 1 · answered Jul 03 '12 at 17:48

4

Consider using command line tools which are much better suited to deal with large amounts of data.

grep, sed, awk, or some combination thereof.

answered Jul 03 '12 at 17:48

Andy Jones

Ωmega · Accepted Answer · 2012-07-03T17:51:29.453

3

Base on your code I suggest you to do it this way:

Set variable $data to empty string
Set variable $work to empty string; read block of data and append this string to $data
Use regex #^(.*?)(<tr>\n(?!.*<tr>\n).*)$# to split $data to $work and $data
Find all matches in $work
Go back to point #2 while data available
Find all matches in $data

edited Jul 03 '12 at 17:51

answered Jul 03 '12 at 17:45

Ωmega

2. how big blick of data? What if I read "datada"... The second data "da" will be not processed.. – Marek Javůrek Jul 03 '12 at 17:52
@MarekJavůrek - you can do this with **any size of block** and it will work. Regarding your question of example - it will be split to 2 parts and second will be processed with new block of data (in point #2 it says **append** which means to add to the end of existing one: `$data = $data . $new` or `$data .= $new`) - just code it and test it. – Ωmega Jul 03 '12 at 17:58

2 Answers2