2

I have a .txt file with millions of lines of text

The code below Delete a specific line (.com domains) in a .txt file. But large files can not do :(

<?php 
$fname = "test.txt";
$lines = file($fname);
foreach($lines as $line) if(!strstr($line, ".com")) $out .= $line; 
$f = fopen($fname, "w"); 
fwrite($f, $out); 
fclose($f); 
?> 

I want to remove certain lines and put them in another file

For example, the list of domain names of sites. cut the .com domain and paste it in another file...

seyedrezabazyar
  • 89
  • 1
  • 11
  • 1
    `But large files can not do :(` What does this mean? What's the problem with large files? Why doesn't it work? – clinomaniac Apr 18 '18 at 21:21
  • Please note, that PHP uses the unix file stream controller (r, r+, w,...). You can append something to a file or rewrite it, but you can not insert or delete anything – Lithilion Apr 19 '18 at 03:31
  • @clinomaniac When the file contains millions of lines, the program will not run. But it runs for small files – seyedrezabazyar Apr 20 '18 at 15:02

3 Answers3

2

Here's an approach using http://php.net/manual/en/class.splfileobject.php and working with a temporary file.

$fileName = 'whatever.txt';
$linesToDelete = array( 3, 5 );

// Working File
$file = new SplFileObject( $fileName, 'a+' );
$file->flock( LOCK_EX );
// Temp File
$temp = new SplTempFileObject( 0 );
$temp->flock( LOCK_EX );
// Wite the temp file without the lines
foreach( $file as $key => $line )
{
  if( in_array( $key + 1, $linesToDelete ) === false )
  {
    $temp->fwrite( $line );
  }
}
// Write Back to the main file
$file->ftruncate(0);
foreach( $temp as $line )
{
  $file->fwrite( $line );
}
$file->flock( LOCK_UN );
$temp->flock( LOCK_UN );

This may be slow though, but a 40 meg file with 140000 lines takes 2.3 seconds on my windows xampp setup. This could be sped up by writing to a temp file and doing a file move, but I didn't want to step on file permissions in your environment.


Edit: Solution using Rename/Move instead of second write

$fileName = __DIR__ . DIRECTORY_SEPARATOR . 'whatever.txt';
$linesToDelete = array( 3, 5 );

// Working File
$file = new SplFileObject( $fileName, 'a+' );
$file->flock( LOCK_EX );
// Temp File
$tempFileName = tempnam( sys_get_temp_dir(), rand() );
$temp = new SplFileObject( $tempFileName,'w+');
$temp->flock( LOCK_EX );
// Write the temp file without the lines
foreach( $file as $key => $line )
{
  if( in_array( $key + 1, $linesToDelete ) === false )
  {
    $temp->fwrite( $line );
  }
}
// File Rename
$file->flock( LOCK_UN );
$temp->flock( LOCK_UN );
unset( $file, $temp ); // Kill the SPL objects relasing further locks
unlink( $fileName );
rename( $tempFileName, $fileName );
Scuzzy
  • 12,186
  • 1
  • 46
  • 46
  • 1
    most suitable solution for files having very large size. It took a few seconds to edit my 450+mb size file :) Thanks @scuzzy – Fahad Ali Jul 08 '20 at 07:39
  • 1
    @FahadAli good to hear it, was the first or second option best? It's been a while since I wrote this answer, so i forget :D – Scuzzy Jul 09 '20 at 12:16
  • :D ofcourse you would have forgotten as its a 2 years old answer LOL! Well i used the second option and it worked for me because that was also my requirement to use the same filename after removing the specific lines from my file. Cheers! – Fahad Ali Jul 14 '20 at 08:35
0

It could be because of the large size of the file that its taking too much of space. When you do file('test.txt'), it reads the entire file into an array. Instead, you can try using Generators.

GeneratorsExample.php

<?php
class GeneratorsExample {
    function file_lines($filename) {
        $file = fopen($filename, 'r'); 
        while (($line = fgets($file)) !== false) {
            yield $line; 
        } 
        fclose($file); 
    }

    function copyFile($srcFile, $destFile) {
        foreach ($this->file_lines($srcFile) as $line) {
            if(!strstr($line, ".com"))  {
                $f = fopen($destFile, "a"); 
                fwrite($f, $line); 
                fclose($f); 
            }
        }
 }
}

callingFile.php

<?php
    include('GeneratorsExample.php');
    $ob = new GeneratorsExample();
    $ob->copyFile('file1.txt', 'file2.txt')
Ashish Ranjan
  • 12,760
  • 5
  • 27
  • 51
-3

While you could use tens of lines of PHP code, one line of shell code will do.

$ grep Bar.com stuff.txt > stuff2.txt

or as PHP

system ("grep Bar.com stuff.txt > stuff2.txt");
Joshua
  • 40,822
  • 8
  • 72
  • 132
jtarleton
  • 57
  • 1
  • 5