0

Let me preface this with the fact that I am fairly new to Docker, Jenkins, GCP/Cloud Storage and Python.

Basically, I would like to write a Python app, that runs locally in a Docker container (alpine3.7 image) and reads chunks, line by line, from a very large text file that is dropped into a GCP cloud storage bucket. Each line should just be output to the console for now.

I learn best by looking at working code, I am spinning my wheels trying to put all the pieces together using these technologies (new to me).

I already have the key file for that cloud storage bucket on my local machine.

I am also aware of these posts:

I just need some help putting all these pieces together into a working app.

I understand that I need to set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the key file in the container. However, I don't know how to do that in a way that works well for multiple developers and multiple environments (Local, Dev, Stage and Prod).

CodeDreamer68
  • 420
  • 5
  • 10
  • Instead of asking "show me how to do this", learn each part that you don't know. Start with some examples and implement them. There are numerous examples on Google's site, the Internet, Stackoverflow, etc. Then study what they do. Stackoverflow will help you solve coding problems. When you have code with a problem, then we can help you. Remember, the best place to start is to read all of the documentation and not stop when one item confuses you. – John Hanley Dec 12 '19 at 21:42
  • Step one is probably to write the program, completely ignoring Docker and Jenkins. If you can make it work in a virtual environment then you can package it in Docker; there's nothing particularly special about the Docker environment here. – David Maze Dec 13 '19 at 00:26

1 Answers1

2

This is just a simple quickstart (I am sure it can be done better) to read a file from a Google Cloud Storage bucket via a python app (Docker container deployed to Google Cloud Run):

You can find more information here link

  1. Create a directory with the following files:

    a. app.py

    import os
    from flask import Flask
    from google.cloud import storage
    
    app = Flask(__name__)
    
    @app.route('/')
    def hello_world():
    
      storage_client = storage.Client()
      file_data = 'file_data'
      bucket_name = 'bucket'
      temp_file_name = 'temp_file_name'
      bucket = storage_client.get_bucket(bucket_name)
      blob = bucket.get_blob(file_data)
      blob.download_to_filename(temp_file_name)
    
      temp_str=''
      with open (temp_file_name, "r") as myfile:
         temp_str = myfile.read().replace('\n', '')
    
     return temp_str
    
    if __name__ == "__main__":
        app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080))) 
    

    b. Dockerfile

    # Use an official Python runtime as a parent image
    FROM python:2.7-slim
    
    # Set the working directory fo /app 
    WORKDIR /app
    
    # Copy the current directory contents into the container /app
    COPY . /app
    
    # Install any needed packages specified in requirements.txt
    RUN pip install --trusted-host pypi.python.org -r requirements.txt
    RUN pip install google-cloud-storage
    
    # Make port 80 available to the world outside the container
    EXPOSE 80
    
    # Define environment variable
    ENV NAME World
    
    # Run app.py when the container launches
    CMD ["python", "app.py"] 
    

    c. requirements.txt

    Flask==1.1.1
    gunicorn==19.9.0
    google-cloud-storage==1.19.1
    
  2. Create a service account to access the storage form Cloud Run:

    gcloud iam service-accounts create cloudrun --description 'cloudrun'
    
  3. Set the permission of the service account:

    gcloud projects add-iam-policy-binding wave25-vladoi --member serviceAccount:cloud-run@project.iam.gserviceaccount.com  --role roles/storage.admin 
    
  4. Build the container image:

    gcloud builds submit --tag gcr.io/project/hello
    
  5. Deploy the application to Cloud Run:

    gcloud run deploy --image gcr.io/project/hello --platform managed ----service-account cloud-run@project.iam.gserviceaccount.com
    

EDIT :

One way to develop locally is :

  1. Your Dev Opp Team will get the service account key.json:

    gcloud iam service-accounts keys create ~/key.json --iam-account serviceAccount:cloudrun@project.iam.gserviceaccount.com
    
  2. Store the key.json file in the same working directory

  3. The Dockerfile command `COPY . /app ' will copy the file to Docker container

  4. Change the app.py to :

     storage.Client.from_service_account_json('key.json')
    
marian.vladoi
  • 7,663
  • 1
  • 15
  • 29
  • Thank you very much Marian. Unfortunately, I don't have access to create service accounts, build or deploy apps to GCP, our DevOps team does that for us. What I need to do is develop and test the code locally first, then ask them to deploy the package for me. How can I run this container locally and provide it the environment variable GOOGLE_APPLICATION_CREDENTIALS (containing the key file DevOps provided me) that will allow my remote machine to access cloud storage? I also know that DevOps has set up Vault, so I'm also trying to learn how that works. – CodeDreamer68 Dec 14 '19 at 16:49
  • ENV GOOGLE_APPLICATION_CREDENTIALS /path/to/your/credentials/file/in/container – Piyush Singh Dec 14 '19 at 18:53
  • Thanks very much, I think I'm unblocked now. I am still not sure of what is best practice though. What I ended up doing is moving the key file into a folder of the project named .key. Then I added that folder to the .gitignore and .dockerignore files so the key isn't copied or stored anywhere (for security). Then I updated the Dockerfile with these lines: COPY .key/key.json /root/key.json ENV GOOGLE_APPLICATION_CREDENTIALS /root/key.json – CodeDreamer68 Dec 14 '19 at 20:35
  • you can check here how to read big files in chunks, python [link](https://github.com/GoogleCloudPlatform/storage-file-transfer-json-python/blob/master/chunked_transfer.py) – marian.vladoi Dec 14 '19 at 20:38
  • May I ask a question about parameter "temp_file_name"? Is it a location in local PC or Cloud Storage? After deploying the application by gloud command, is the application able to read the files from "temp_file_name"? – Mohammad Oct 15 '20 at 01:36