1

Following code will download the data from DoubleClick Search and upload into Google cloud Storage(in GAE). The file is not uploading completely, every file is uploading only 32MB. Actual file size is 4GB. Is there any way I can upload the larger files to Google cloud Storage from App Engine.

Upload to Google Cloud Storage:

def _writeFilesinGCS(filename, data):
  ### Initializing Google cloud Storage Object
  print "In _writeFilesinGCS function"
  tmp_filenames_to_clean_up = []
  write_retry_params = _gcs.RetryParams(backoff_factor=1.1)
  gcs_file=_gcs.open(filename, 'w', content_type='text/plain',retry_params=write_retry_params)
  gcs_file.write(data)
  gcs_file.close()
  tmp_filenames_to_clean_up.append(filename)

Download the file:

    def download_files(service, report_run_id, report_fragment,loaddate,file_name,cfg):
      """Generate and print sample report.

  Args:
    service: An authorized Doublelcicksearch service.
    report_id: The ID DS has assigned to a report.
    report_fragment: The 0-based index of the file fragment from the files array.
  """
  bucket_name = cfg._gcsbucket
  bucket = '/' + bucket_name
  filename = bucket + '/' + file_name + "_MMA_" + report_fragment + "_" + loaddate + ".csv"
  print "Enter into download_files", report_run_id
  request = service.reports().getFile(reportId=report_run_id, reportFragment=report_fragment)
  _writeFilesinGCS(filename,request.execute())
  dsbqfuns._dsbqinsert(report_run_id,cfg,file_name,1)
user374374
  • 343
  • 3
  • 17
  • What does bigger files mean? How big is the file fragment represented by report_fragment? (side note: your GCS write needs to take that into account as well, currently fragments of the same file overwrite each-other) – Dan Cornilescu Jul 18 '17 at 16:46
  • Double click Search will split the file into multiple files and for each file they give one number that is called report_gragment .Each fragment ( File size ) is between 2GB to 4GB. – user374374 Jul 18 '17 at 17:02
  • Can you determine if the entire fragment (i.e. the result of `request.execute()`) makes it to your app? Asking as normally responses to outbound requests based on URL Fetch are limited to 32M, see [Quotas and limits](https://cloud.google.com/appengine/docs/standard/python/outbound-requests#quotas_and_limits) – Dan Cornilescu Jul 18 '17 at 17:15
  • How to handle this. My most of the requests are download form facebook, double click search or double click campaign manager or Adwords etc. Most of them are http requests and file size is definitely greater than 32MB. I need to run entire my script in App Engine. Is there any way I can download the entire data and upload into Google Cloud Storage. – user374374 Jul 18 '17 at 17:37
  • I don't think so - downloading the entire fragment data would then require 2-4GB of memory, which GAE instances don't have. You need to be able to fragment it in smaller pieces. – Dan Cornilescu Jul 18 '17 at 17:53
  • Related: https://stackoverflow.com/questions/45061496/automatically-retrieving-large-files-via-public-http-into-google-cloud-storage – Dan Cornilescu Jul 18 '17 at 17:55
  • Dan, I checked to split the data. The least I can split to 400MB for double click search. As I understand from docs or stackover flow is we can download this data from the App Engine. Need to run from either Compute Engine or some where. Am I right? – user374374 Jul 18 '17 at 18:24
  • Dan - Does the flexible environment also have same kind of limit.? – user374374 Jul 25 '17 at 15:57
  • Apparently not: `The response size is unlimited` from https://cloud.google.com/appengine/docs/flexible/python/how-requests-are-handled#response_limits. I'm not that familiar, tho, not using flex env. – Dan Cornilescu Jul 25 '17 at 16:32

0 Answers0