2

I'm using Backblaze B2 to store files and am using their documentation code to upload via their API. However their code uses fread to read the file, which is causing issues for files that are larger than 100MB as it tries to load the entire file into memory. Is there a better way to this that doesn't try to load the entire file into RAM?

$file_name = "file.txt";
$my_file = "<path-to-file>" . $file_name;
$handle = fopen($my_file, 'r');
$read_file = fread($handle,filesize($my_file));

$upload_url = ""; // Provided by b2_get_upload_url
$upload_auth_token = ""; // Provided by b2_get_upload_url
$bucket_id = "";  // The ID of the bucket
$content_type = "text/plain";
$sha1_of_file_data = sha1_file($my_file);

$session = curl_init($upload_url);

// Add read file as post field
curl_setopt($session, CURLOPT_POSTFIELDS, $read_file); 

// Add headers
$headers = array();
$headers[] = "Authorization: " . $upload_auth_token;
$headers[] = "X-Bz-File-Name: " . $file_name;
$headers[] = "Content-Type: " . $content_type;
$headers[] = "X-Bz-Content-Sha1: " . $sha1_of_file_data;
curl_setopt($session, CURLOPT_HTTPHEADER, $headers); 

curl_setopt($session, CURLOPT_POST, true); // HTTP POST
curl_setopt($session, CURLOPT_RETURNTRANSFER, true);  // Receive server response
$server_output = curl_exec($session); // Let's do this!
curl_close ($session); // Clean up
echo ($server_output); // Tell me about the rabbits, George!

I have tried using:

curl_setopt($session, CURLOPT_POSTFIELDS, array('file' => '@'.realpath('file.txt')));

However I get an error response: Error reading uploaded data: SocketTimeoutException(Read timed out)

Edit: Streaming the filename withing the CURL also doesn't seem to work.

Rohan
  • 21
  • 2

1 Answers1

3

The issue you are having is related to this.

fread($handle,filesize($my_file));

With the filesize in there you might as well just do file_get_contents. it's much better memory wise to read 1 line at a time with fget

$handle = fopen($myfile, 'r');

while(!feof($handle)){
     $line = fgets($handle);
} 

This way you only read one line into memory, but if you need the full file contents you will still hit a bottleneck.

The only real way is to stream the upload.

I did a quick search and it seems the default for CURL is to stream the file if you give it the filename

 $post_data['file'] = 'myfile.csv';

 curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);

You can see the previous answer for more details

Is it possible to use cURL to stream upload a file using POST?

So as long as you can get past the sha1_file It looks like you can just stream the file, which should avoid the memory issues. There may be issues with time limit though. Also I can't really think of a way around getting the hash if that fails.

Just FYI, personally I never tried this, typically i just us sFTP for large file transfers. So I don't know if it has to be specially post_data['file'] I just copied that from the other answer.

Good luck...

UPDATE

Seeing as streaming seems to have failed (see comments).

You may want to test the streaming to make sure it works. I don't know what all that would involve, maybe stream a file to your own server? Also I am not sure why it wouldn't work "as advertised" and you may have tested it already. But it never hurts to test something, never assume something works until you know for sure. It very easy to try something new as a solution, only to miss a setting or put a path in wrong and then fall back to thinking its all based on the original issue.

I've spent a lot of time tearing things apart only to realize I had a spelling error. I'm pretty adept a programing these days so I typically overthink the errors too. My point is, be sure it's not a simple mistake before moving on.

Assuming everything is setup right, I would try file_get_contents. I don't know if it will be any better but it's more meant to open whole files. It also would seem to be more Readable in the code, because then it's clear that the whole file is needed. It just seems more semantically correct if nothing else.

You can also increase the RAM PHP has access to by using

ini_set('memory_limit', '512M')

You can even go higher then that, depending on your server. The highest I went before was 3G, but the server I uses has 54GB of ram and that was a one time thing, (we migrated 130million rows from MySql to MongoDB, the innodb index was eating up 30+GB ). Typically I run with 512M and have some scripts that routinely need 1G. But I wouldn't just up the Memory willy-nilly. That is usually a last resort for me after optimizing and testing. We do a lot of heavy processing that is why we have such a big server, we also have 2 slave servers (among other things) that run with 16GB each.

As far as what size to put, typically I increment it by 128M tell it works, then add an extra 128M just to be sure, but you might want to go in smaller steps. Typically people always use multiples of 8, but I don't know if that make to much difference these days.

Again, Good Luck.

ArtisticPhoenix
  • 21,464
  • 2
  • 24
  • 38
  • The hash doesn't seem to be an issue from what I can tell, however B2 doesn't accept the file when I try steaming using your last method. :( – Rohan Apr 05 '18 at 10:37
  • The only other thing you could try is `file_get_contents` but you probably won't have a choice but to increase the ram PHP has – ArtisticPhoenix Apr 05 '18 at 16:28
  • Just an update to this file_get_contents does work, however it also runs into memory issues (the files are 1-5GB in size). – Rohan Apr 05 '18 at 23:53