7

I am working with large text files in php (1GB+), I am using

file_get_contents("file.txt", NULL, NULL, 100000000,100); 

To get data from the middle of the file, but if i wanted to change the data in the file to something that is of different change than the origional data, I would have to re-write the entire file.

How can I change data within the file (variable length) without overwriting data if the data is larger than the original? I keep an index of the different data blocks within the file and their byte location. It seems that the only alternative is to dedicate x amount of bytes to each piece of data and then rewrite that block if i wanted to change it... the problem with this is that it would take up a lot more space than needed in just null bytes, and it would take longer to write... and that still would not solve how to "remove" data, as the file could never shrink in size... I really need some help here...

If I used prefixed blocks for each piece of data in the file, like 1 mb, then I wanted to enter data that was only 100kb, that entry would take 10x actual needed space, and the entry could never be changed to something more than 1mb of data, as it would overwrite more than 1st dedicated block... removing it would not be possible... hope this makes any sense... I am not looking for alternatives, I am looking to write and change data in the middle of files, hehe...

UPDATE: Yes, I would like to replace the old data, but if the new data extends more than the old data I would want the rest of the data to be pushed further into the file...

consider this: 0000000HELLODATA00000000 the zeros represent empty space, nothing... now I would like to replace HELLO with SOMETHING, now since something is larger than hello, simply writing in the starting point of hello would extend byond hello and start overwriting data... therefore i would like DATA to be pushed futher into the file, to make room for SOMETHING without overwriting DATA... hehe

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Daniel
  • 319
  • 8
  • 17
  • fopen(), fseek() and fwrite() comes to mind ... if you know where in the file you are reading the data you can use it as well instead of loading the complete file into memory – Gerald Schneider May 29 '13 at 12:09
  • 1
    You can use database in place of text file. Other way is to use mixture of database and text file. Divide large file into parts and get parts information from database. Exactly implements depend on requirements. – web2students.com May 29 '13 at 12:10
  • Ftuncate to resize the file and then move the data to make space http://stackoverflow.com/a/10467726/778719 – Danack May 29 '13 at 12:10
  • You can read content of a file to a string and then use strlen() to find length of the string. Now you can easily find the middle of string form a new string by adding the text after the middle. – cartina May 29 '13 at 12:11
  • I am already using fseek and fwrite to replace the data, then only problems are when i am trying to replace the data with data of different size than what was already there... reading the whole file into memory each time is not an option... – Daniel May 29 '13 at 12:17
  • I don't fancy manipulating 1GB strings in PHP, don't load files that size into memory – Mark Baker May 29 '13 at 12:17
  • 4
    The short answer: use a database. It was designed for field-based-storage. Which is exactly what it sounds like you're trying to re-invent. – ircmaxell May 29 '13 at 12:53

3 Answers3

11

To Overwrite Data :

$fp = fopen("file.txt", "rw+");
fseek($fp, 100000000); // move to the position
fwrite($fp, $string, 100); // Overwrite the data in this position 
fclose($fp);

To Inject Data

This is a tricky because you have to rewrite the file. It can be optimized with partial modificationfrom point of injection rather than the whole file

$string = "###INJECT THIS DATA ##### \n";
injectData("file.txt", $string, 100000000);

Function Used

function injectData($file, $data, $position) {
    $fpFile = fopen($file, "rw+");
    $fpTemp = fopen('php://temp', "rw+");

    $len = stream_copy_to_stream($fpFile, $fpTemp); // make a copy

    fseek($fpFile, $position); // move to the position
    fseek($fpTemp, $position); // move to the position

    fwrite($fpFile, $data); // Add the data

    stream_copy_to_stream($fpTemp, $fpFile); // @Jack

    fclose($fpFile); // close file
    fclose($fpTemp); // close tmp
}
Baba
  • 94,024
  • 28
  • 166
  • 217
  • This is workable but wouldn't it overwrite characters from the point specified instead of insert? I think the second paragraph of the question refers to writing new data without losing older values. Not sure though... – itsols May 29 '13 at 12:17
  • Updated answer to reflect data injection – Baba May 29 '13 at 12:49
  • why we are using two text files here? – shashi verma Jul 13 '19 at 08:57
  • @shashiverma where did you see text files ? – Baba Jul 14 '19 at 11:06
  • Sorry, I was talking about this "php://temp" .At the place of "php://temp" I linked a text file (abc.txt)and whenever I call the function it copy all the data from $file to abc.txt file and $file gets updated.If I remove the operations for second file ( "php://temp" or "abc.txt") the $string does not get appended in the file it overwrite the contents. – shashi verma Jul 15 '19 at 04:11
4

A variant on Baba's answer, not sure if it would be more efficient when working with larger files:

function injectData($file, $data, $position) {
    $fpFile = fopen($file, "rw+");
    $fpTemp = fopen('php://temp', "rw+");
    stream_copy_to_stream($fpFile, $fpTemp, $position);
    fwrite($fpTemp, $data);
    stream_copy_to_stream($fpFile, $fpTemp, -1, $position);

    rewind($fpFile);
    rewind($fpTemp);
    stream_copy_to_stream($fpTemp, $fpFile);

    fclose($fpFile);
    fclose($fpTemp);
}

injectData('testFile.txt', 'JKL', 3);

Variant of my earlier method that eliminates one of the stream_copy_to_stream() calls, so should be a shade faster:

function injectData3($file, $data, $position) {
    $fpFile = fopen($file, "rw+");
    $fpTemp = fopen('php://temp', "rw+");
    stream_copy_to_stream($fpFile, $fpTemp, -1, $position);
    fseek($fpFile, $position);
    fwrite($fpFile, $data);
    rewind($fpTemp);
    stream_copy_to_stream($fpTemp, $fpFile);

    fclose($fpFile);
    fclose($fpTemp);
}
Mark Baker
  • 209,507
  • 32
  • 346
  • 385
  • streamcopy #1 Copy source file up to position; streamcopy #2 Copy source file from position; streamcopy #3 Copy temp back to source... avoids using filesystem for smaller files – Mark Baker May 29 '13 at 14:23
  • Still don't know how well it'll handle big files.... might actually run some tests tonight to see if it's practical for "real-time" work building ISAM files – Mark Baker May 29 '13 at 14:29
  • Running test at the moment .. would update you with my investigation too ... But here is what i have on small files so far http://codepad.viper-7.com/0zaujx – Baba May 29 '13 at 14:40
  • Nice Updated .. Significant Improvement after `1,000,000` iterations http://codepad.viper-7.com/Yl1q9y – Baba May 29 '13 at 15:23
  • :) I was hoping it would be a good improvement.... the things one does while ones dev server is otherwise occupied with large directory diffs hogging the system – Mark Baker May 29 '13 at 15:28
  • This is a nice experiment and very educating .. got to know `php://temp` is faster then `tmpfile` today .... well done – Baba May 29 '13 at 15:29
  • 2
    I wish I could give you more upvotes... this is probably the most performant solution :) – chris97ong Sep 09 '16 at 08:23
  • @chris97ong - glad you found this answer.... I was looking for it after your question yesterday, but wasn't able to locate it – Mark Baker Sep 09 '16 at 08:29
  • Brilliant answer! I just wanted to make a very modest increment to injectData3. If one expects to work from the end to the beginning (i.e., from the -50th position), add this line after the $fpTemp one: `if ($position < 0) { fseek($fpFile,$position, SEEK_END); $position = ftell($fpFile); fseek($fpFile, 0); }` – flen May 05 '17 at 06:51
  • why we are using two text files here? – shashi verma Jul 13 '19 at 08:58
3

Another variant of the injectData() function:

function injectData($file, $data, $position) 
{
    $temp = fopen('php://temp', "rw+");
    $fd = fopen($file, 'r+b');

    fseek($fd, $position);
    stream_copy_to_stream($fd, $temp); // copy end

    fseek($fd, $position); // seek back
    fwrite($fd, $data); // write data

    rewind($temp);
    stream_copy_to_stream($temp, $fd); // stich end on again

    fclose($temp);
    fclose($fd);
}

It copies the end of file (from $position onwards) into a temporary file, seeks back to write the data and stitches everything back up.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309