5

I have a file that I'm reading with PHP. I want to look for some lines that start with some white space and then some key words I'm looking for (for example, "project_name:") and then change other parts of that line.

Currently, the way I handle this is to read the entire file into a string variable, manipulate that string and then write the whole thing back to the file, fully replacing the entire file (via fopen( filepath, "wb" ) and fwrite()), but this feels inefficient. Is there a better way?

Don Rhummy
  • 24,730
  • 42
  • 175
  • 330
  • 1
    "Best" is subjective. From the close reasons "_We expect answers to be supported by facts, references, or specific expertise, but this question will likely solicit debate, arguments, polling, or extended discussion._" Would you consider improving your question by selecting a particular method and explaining how it does not meet your needs? This will allow is to provide specific, rather than subjective, answers. – George Cummins Jun 02 '13 at 20:27
  • 4
    @GeorgeCummins Your comment doesn't apply here. This is a typical programming question – hek2mgl Jun 02 '13 at 20:28
  • 1
    @Baba Are you sure your attempts are faster than the one [I've supposed](http://stackoverflow.com/a/16887051/171318)? Note that a simple `rename()` is very fast. Will prepare some benchmarks :) Also note, that the seek where the string should be replaced is not known in most application scenarios – hek2mgl Jun 02 '13 at 20:58
  • @hek2mgl i think you should do your test .... So many things wrong with your function .. that make is 10x slower ... and too expensive – Baba Jun 02 '13 at 21:04
  • @Baba No need to benchmark. They are just not comparable. You know the seek where to replace. OP does not. That's why this is **not** a duplicate of the question you quoted. Btw, would like to know what is soooo wrong with my example. Note that I have not optimized it as it is just an example that shows what the text says – hek2mgl Jun 02 '13 at 21:05
  • @hek2mgl: Whether it's "a typical programming question" or not is utterly irrelevant. It's not on topic here. – Lightness Races in Orbit Jun 03 '13 at 21:58
  • @LightnessRacesinOrbit Maybe. I'm tired of this discussion. Delete it if you want – hek2mgl Jun 03 '13 at 22:05
  • 1
    @Baba I'm still not get how this is a duplicate and how your answer fits here. You have the position where text should be replaced as param to your function. Note that the position is not known here. It is search and replace, not inject. Can you tell me what 10 things are wrongs with my answer? I would like to benchmark (and maybe improve it) But I just cannot compare both solutions as they are not the same. I know you are a clever guy, maybe I'm missing something here – hek2mgl Jun 03 '13 at 22:06
  • @Baba I agree with hek2mgl. The other question is not exactly the same. I'm looking to read the data in the file and replace only based on those contents. The other question is simply injection regardless of content. – Don Rhummy Jun 03 '13 at 22:43

1 Answers1

3

Update: After finishing my function I had time to benchmark it. I've used a 1GB large file for testing but the results where unsatisfying :|

Yes, the memory peak allocation is significantly smaller:

  • standard solution: 1,86 GB
  • custom solution: 653 KB (4096 bytes buffersize)

But compared to the following solution there is just a slight performance boost:

ini_set('memory_limit', -1);

file_put_contents(
    'test.txt',
    str_replace('the', 'teh', file_get_contents('test.txt'))
);

the script above tooks ~16 seconds, the custom solution took ~13 seconds.

Resume: The custome solution is slight faster on large files and consumes much less memory(!!!).

Also if you want to run this in a web server environment the custom solution is better as many concurrent scripts would likely consume the whole available memory of the system.


Original Answer:

The only thing that comes in mind, is to read the file in chunks which fit the file systems block size and write the content or modified content back to a temporary file. After finish processing you use rename() to overwrite the original file.

This would reduce the memory peak and should be significantly faster if the file is really large.

Note: On a linux system you can get the file system block size using:

sudo dumpe2fs /dev/yourdev | grep 'Block size'

I got 4096

Here comes the function:

function freplace($search, $replace, $filename, $buffersize = 4096) {

    $fd1 = fopen($filename, 'r');
    if(!is_resource($fd1)) {
        die('error opening file');
    }   

    // the tempfile can be anywhere but on the same partition as the original
    $tmpfile = tempnam('.', uniqid());
    $fd2 = fopen($tmpfile, 'w+');

    // we store len(search) -1 chars from the end of the buffer on each loop
    // this is the maximum chars of the search string that can be on the 
    // border between two buffers
    $tmp = ''; 
    while(!feof($fd1)) {
        $buffer = fread($fd1, $buffersize);
        // prepend the rest from last one
        $buffer = $tmp . $buffer;
        // replace
        $buffer = str_replace($search, $replace, $buffer);
        // store len(search) - 1 chars from the end of the buffer
        $tmp = substr($buffer, -1 * (strlen($search)) + 1); 
        // write processed buffer (minus rest)
        fwrite($fd2, $buffer, strlen($buffer) - strlen($tmp));
    };  

    if(!empty($tmp)) {
        fwrite($fd2, $tmp);
    }   

    fclose($fd1);   
    fclose($fd2);
    rename($tmpfile, $filename);
}

Call it like this:

freplace('foo', 'bar', 'test.txt');
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • 2
    What happens if the search string straddles read buffers? – Jon Jun 02 '13 at 21:04
  • Thanks for comment. I have updated the post. Yes, to get this bullet proof it needs more attention. – hek2mgl Jun 02 '13 at 21:10
  • @hek2mgl is there a benefit to doing it this way instead of opening with "x+" and then combining the steps? – Don Rhummy Jun 03 '13 at 15:10
  • Yes. the benefit is the reduced memory peak. You will just ~$buffersize memory instead of as much as the file's size. However, I'll update the post when I have time. Maybe in the evening. Will prepare a version that handles the comment of Jon.. Almost ready :) – hek2mgl Jun 03 '13 at 15:18
  • @Jon Have updated the post to handle your comment. Thanks for it. Of course this will not work with regexes – hek2mgl Jun 03 '13 at 21:40
  • Working with multiline regexes will be really tricky – hek2mgl Jun 03 '13 at 22:47
  • @hek2mgl Good point. Even though they have almost the same speed, there's limited RAM and if every session tried to open a 1GB file, it would run out of RAM, so the custom solution makes more sense. – Don Rhummy Jun 04 '13 at 00:56
  • @DonRhummy Yep! Also I had a bug in the prior version: The temporary file should not be located on /tmp. it has to be on the same partition as the original, otherwise the final `rename()` command will take 7 seconds. Now the custom solution is faster and takes less more memory. Note if they delete the querstion here (because off-topic LOL :D), I have created a little [blog article](http://www.metashock.de/2013/06/whats-the-best-most-efficient-way-to-search-for-content-in-a-file-and-change-it-with-php/) . We can continue discussion there (if necessary). Also I will update it if I can improve it – hek2mgl Jun 04 '13 at 07:23
  • @hek2mgl Thanks. If you can help rustle up two more "reopen" votes, it will be safer. The `rename` issue just shows how tiny things can make a huge difference in undocumented ways. – Don Rhummy Jun 04 '13 at 17:23
  • I already spent one :) The rename issue is a basic computer thing. If you have two harddisk it is necessary to transfer the data between those disks (or partitions). If the are on one partition it requires just a change in the inode entry (setting a new name) while leaving the data unchanged which is very fast and an atomic operation. – hek2mgl Jun 04 '13 at 17:26