0

I have this file of 10 millions words, one word on every line. I'm trying to open that file, read every line, put it in an array and count the number of occurrences for each word.

wartek
mei_atnz
sommerray
swaggyfeed
yo_bada
ronnieradke
… and so on (10M+ lines)

I can open the file, read its size, even parse it line by line and echo the line on the browser (it's very long, of course), but when I'm trying to perform any other operation, the script just refuse to execute. No error, no warning, no die(…), nothing.

Accessing the file is always OK, but it's the operations which are not performed with the same success. I tried this and it worked…

while(!feof($pointer)) {
    $row = fgets($pointer);
    print_r($row);
}

… but this didn't :

while(!feof($pointer)) {
    $row = fgets($pointer);
    array_push($dest, $row);
}

Also tried with SplFileObject or file($source, FILE_IGNORE_NEW_LINES) with the same result every time (not okay with big file, okay with small file)

Guessing that the issue is not the size (150 ko), but probably the length (10M+ lines), I chunked the file to reduce it to ~20k without any improvement, then reduced it again to ~8k lines, and it worked.

I also removed the time limit with set_time_limit(0); or removed (almost) any memory limit both in the php.ini and in my script ini_set('memory_limit', '8192M');.Regarding the errors I could have, I set the error_reporting(E_ALL); at the top of my script.

So the questions are :

  • is there a maximum number of lines that can be read by PHP built-in functions?
  • why I can echo or print_r but not perform any other operations?
Pierre Le Bot
  • 294
  • 1
  • 5
  • 20
  • Your script is terminated by the interpreter when it tries to allocate more memory than it's allowed to allocate. An error message is generated but probably your PHP interpreter and/or the script itself turned the [`error_reporting`](http://php.net/manual/en/function.error-reporting.php) off. That's why you don't get any error message. It is also possible that the error reporting is not turned off but it is configured to not display the errors, only save them in `php_errors.log`. – axiac Aug 23 '17 at 13:48
  • There is no limit of lines or bytes that can be read. The only limit is the configured value for [`memory_limit`](http://php.net/manual/en/ini.core.php#ini.sect.resource-limits) and the computer memory available to the process. Please notice that the data structures internally used by PHP to manage the variables it uses also occupy memory and are counted against `memory_limit`. – axiac Aug 23 '17 at 13:50
  • The error reporting was set to E_ALL (forgot to mention it, I edited my post) and in **php.ini**, the param `display_errors` is set to `on` – Pierre Le Bot Aug 23 '17 at 13:55
  • @axiac, what exactly do you mean by `the data structures internally used by PHP to manage the variables it uses also occupy memory and are counted against memory_limit` ? – Pierre Le Bot Aug 23 '17 at 14:00
  • 1
    PHP is an interpreted language. A simple string variable like `'abc'` uses 3 bytes for the payload (the data) and a lot of memory that stores the meta-data (variable name, variable type etc). Arrays use even more "hidden" memory to store the association between the key and the value, the links between the element (needed to iterate over the values) a.s.o. – axiac Aug 23 '17 at 14:15
  • So, basically, what I ask is too greedy and I will need a computer with more than 8GB of RAM? – Pierre Le Bot Aug 23 '17 at 14:18
  • 1
    *"... more than 8GB of RAM"*... or a different way to process the data. I would use a table in a MySQL database for this processing. The words file can be easily imported into the table, the queries to count the words, sort them etc. are easy to write (and less error-prone than manual processing) and the memory is not an issue any more. It requires a MySQL server running, of course, and some disk space but this is a smaller annoyance I think. If MySQL is not available to you, [SQLite](http://php.net/manual/en/ref.pdo-sqlite.php) can do the job as well (and it doesn't need a server). – axiac Aug 23 '17 at 14:29
  • 1
    Finally found a way to read errors returned by my script: `Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 790528 bytes)`… I guess I'll go with MySQL indeed! Thanks. – Pierre Le Bot Aug 23 '17 at 14:33

1 Answers1

0

I think you might be running into a long execution time:

How to increase the execution timeout in php?

Different operation take different time. Printing might be a lot easier than pushing 10M new data into an array one-by-one. It's strange that you don't get any error messages, you should receive process exceeded time somewhere.

kry
  • 362
  • 3
  • 13
  • I already increased the max_execution_time at the top of my script (and it's also set to `0` in my **php.ini**). And the process just don't start when I do something else than displaying lines – Pierre Le Bot Aug 23 '17 at 13:58