-1

I am using following code to convert a binary file into an array.

$handle = fopen($file, "r");
$contents = fread($handle,filesize($file));

$array = unpack("s*", $contents);

I want to be able to read it in chunks and send multiple separate requests to process it in parallel. For example, I want to grab first 16000 bytes, then next 16000 etc. So I would end up with multiple sets of data to process in parallel

$content1 = first 16000 bytes
$content2 = bytes from 16000 to 32000
$content3 = bytes from 32000 to 48000

I think this is pretty simple I am just not sure how it can be done.

Proper
  • 155
  • 2
  • 11
  • I'm not sure if you're asking about multi-threading (something that's hard to do in PHP and doesn't really look useful for such a simple task on a single file) or you just didn't realise what `filesize($file)` implies. – Álvaro González Aug 05 '18 at 07:33

3 Answers3

2

A simple way would be to use substr() to split out chunks until it runs out of something to process...

$start = 0;
$size = 16000;
$contents = file_get_contents($file);
while ($chunk = substr($contents, $start, $size))   {
    // Process
    echo ">".$chunk."<".PHP_EOL;
    
    $start +=$size;
}

Another way would be to convert it to array to split the string into chunks, you can use str_split()

$contents = file_get_contents($file);
$chunks = str_split($contents, 16000);

file_get_contents() does all the open file/read/close in one go, the str_split() then splits it up into an array of the size chunk you want it (16000 in this case).

Not sure how much performance gain you will get by this, but that is something you will have to test for yourself.

(Also check the notes on the manual page in case you are using multi-byte encoded files).

Community
  • 1
  • 1
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • Your second solution worked for me. Performance comes from the fact that I can convert from binary to data in parallel using curl. I was able to get 10ms processing time on a file that in single thread took over 100 – Proper Aug 05 '18 at 14:44
  • Good improvement, I've resorted to C++ for some parallel tasks but it can happen that the overheads of splitting data/processing/combine results sometimes make the improvement very small or even slower! – Nigel Ren Aug 05 '18 at 14:47
  • Please note that `file_get_contents()` is a quick helper function that loads the entire file into memory. It's the tool of choice to handle small files without hassle but you normally avoid it for large files or performance-sensitive tasks. – Álvaro González Aug 05 '18 at 17:34
0

you should use the multi thread in php see http://php.net/manual/en/intro.pthreads.php

and

Does PHP have threading?

Afshin
  • 1
  • 2
  • It is possible to run parallel processes with php, you would need to manage all the threads by yourself or use something like pthreads. In most cases I use curls ability to send parallel requests to do this type of work – Proper Aug 05 '18 at 14:15
0

Given that the OP has accepted Nigel's answer, then the question was actually how to read arbitrary chunks from a file. That can be done with a slight variation of the original code. Instead of reading the complete file contents:

fread($handle, filesize($file));
               ^^^^^^^^^^^^^^^

… you pass your chunk size as second argument:

$contents = fread($handle, 16000);

Prior to that, you move to the desired location:

// E.g. Read 4th chunk:
fseek($handle, 3 * 16000);

Full stuff:

$handle = fopen($file, "r");
fseek($handle, 3 * 16000);
$contents = fread($handle, 16000);

Add some error checking and you're done. These are really old functions very close to the C implementation so they should be pretty fast and require very little memory.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • I think it this case you would be reading first 16000 then first 32000 and so on What I needed was reading first 16000 then next 16000 and so on – Proper Aug 06 '18 at 16:30
  • @SergeySlyusar Sorry, I'm afraid can't understand your concern. Did I make a typo? Does `fseek()` have some issue I'm not aware of? This is just a usage example that reads 4th chunk for illustration purposes. To generalise it, you need to replace hard-coded numbers with the appropriate variables. – Álvaro González Aug 07 '18 at 06:22