0

I just want to know that because of some the invocations are working and some are not. I couldn't find any official documentation regarding time delay or restriction in quotas provided by Google.

For example: I created a dataframe with the following code:

empty_df = pd.DataFrame(val)
empty_df.to_csv('/tmp/{}.csv'.format(SAMPLE))

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob('FOLDER1/{}.csv'.format(SAMPLE))
blob.upload_from_filename('/tmp/{}.csv'.format(SAMPLE)) 

The SAMPLE variable that I defined in the code has been changing on every loop. I ran it in for loop and also Cloud Functions triggered multiple times (between 1 up to 50, or more than 50). Until this point everything looks fine. After the function is completed I can not see some of the CSV files in the 'FOLDER1' folder. And I also have the same problem in the copy_blob function.

For example: I want to move CSV files from FOLDER1 to FOLDER2 with new name that I created with above codes. Some of the CSV files are not appearing in FOLDER2 and also sending me 404 not found files error in logs. But when I manually checked the buckets, I can see the files there.

def copy_blob(
    bucket_name, blob_name, destination_bucket_name, destination_blob_name,
    status_path,  delete_blob = None
):
    """Copies a blob from one bucket to another with a new name."""

    storage_client = storage.Client()
    source_bucket = storage_client.bucket(bucket_name)
    source_blob = source_bucket.blob(blob_name)
    destination_bucket = storage_client.bucket(destination_bucket_name)


    blob_copy = source_bucket.copy_blob(
        source_blob, destination_bucket, destination_blob_name
    )

    #delete old blob
    if delete_blob == True : 
        source_blob.delete()                
    else: 
        pass 

    print(
        "Blob {} in bucket {} copied to blob {} in bucket {}.".format(
            source_blob.name,
            source_bucket.name,
            blob_copy.name,
            destination_bucket.name,
        )
    )

I used to that code to move the files. Does anyone have an idea?

Daniel Ocando
  • 3,554
  • 2
  • 11
  • 19
awfullyCold
  • 102
  • 1
  • 10
  • As you can see on the relevant section of the [documentation](https://cloud.google.com/storage/quotas#objects) there is no limit to writes across multiple objects. The 404 error messages may have to do with the `blob.upload_from_filename('/tmp/{}.csv'.format(SAMPLE)) ` method not correctly uploading the files to Cloud Storage. If you introduce a very short time delay (about 0.01s) between uploading each CSV file do you get the same issue? – Daniel Ocando Feb 03 '20 at 16:35
  • Also how is the `SAMPLE` variable created? Remember that you [need to avoid using sequential filenames](https://cloud.google.com/storage/docs/best-practices#naming) if you are uploading many files in parallel. – Daniel Ocando Feb 03 '20 at 16:38
  • in my opinion, it's not relevant with delay, because of, every instance of the Google Cloud Function works isolated from each other. – awfullyCold Feb 04 '20 at 10:52
  • That's correct. But I was not sure if you used a for loop to call Cloud Functions individually or to call that specific function inside the Cloud Function (I suggested the delay if the last part was the case). Please test by not using sequential filenames. [Here](https://stackoverflow.com/questions/10501247/best-way-to-generate-random-file-names-in-python) is an example of how to achieve this with Python posted within the community. – Daniel Ocando Feb 04 '20 at 10:59

0 Answers0