41

I feel kind of stupid right now. I have been reading numerous documentations and stackoverflow questions but I can't get it right.

I have a file on Google Cloud Storage. It is in a bucket 'test_bucket'. Inside this bucket there is a folder, 'temp_files_folder', which contains two files, one .txt file named 'test.txt' and one .csv file named 'test.csv'. The two files are simply because I try using both but the result is the same either way.

The content in the files is

hej
san

and I am hoping to read it into python the same way I would do on a local with

textfile = open("/file_path/test.txt", 'r')
times = textfile.read().splitlines()
textfile.close()
print(times)

which gives

['hej', 'san']

I have tried using

from google.cloud import storage

client = storage.Client()

bucket = client.get_bucket('test_bucket')

blob = bucket.get_blob('temp_files_folder/test.txt')

print(blob.download_as_string)

but it gives the output

<bound method Blob.download_as_string of <Blob: test_bucket, temp_files_folder/test.txt>>

How can I get the actual string(s) in the file?

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
digestivee
  • 690
  • 1
  • 8
  • 16

5 Answers5

43

download_as_string is a method, you need to call it.

print(blob.download_as_string())

More likely, you want to assign it to a variable so that you download it once and can then print it and do whatever else you want with it:

downloaded_blob = blob.download_as_string()
print(downloaded_blob)
do_something_else(downloaded_blob)
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • I'm getting `DistributionNotFound: The 'google-cloud-storage' distribution was not found and is required by the application` error – Jay Patel Feb 12 '19 at 07:25
  • how do i parse the tsv in gcp and turn it into json? (none of this is helping me). – joe hoeller Sep 11 '19 at 20:38
  • 3
    As of today `download_as_string` is deprecated, in favor of `download_as_text` https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob.download_as_text – vpipkt Jun 29 '21 at 19:09
17

The method 'download_as_string()' will read in the content as byte.

Find below an example to process a .csv file.

import csv
from io import StringIO

from google.cloud import storage

storage_client = storage.Client()
bucket = storage_client.get_bucket(YOUR_BUCKET_NAME)

blob = bucket.blob(YOUR_FILE_NAME)
blob = blob.download_as_string()
blob = blob.decode('utf-8')

blob = StringIO(blob)  #tranform bytes to string here

names = csv.reader(blob)  #then use csv library to read the content
for name in names:
    print(f"First Name: {name[0]}")
10

According to the documentation (https://googleapis.dev/python/storage/latest/blobs.html), As of the time of writing (2021/08), the download_as_string method is a depreciated alias for the download_as_byte method which - as suggested by the name - returns a byte object.

You can instead use the download_as_text method to return a str object.

For instances, to download the file MYFILE from bucket MYBUCKET and store it as an utf-8 encoded string:

from google.cloud.storage import Client
client = Client()
bucket = client.get_bucket(MYBUCKET)
blob = bucket.get_blob(MYFILE)
downloaded_file = blob.download_as_text(encoding="utf-8")

You can then also use this in order to read different file formats. For json, replace the last line to

import json
downloaded_json_file = json.loads(blob.download_as_text(encoding="utf-8"))

For yaml files, replace the last line to :

import yaml
downloaded_yaml_file = yaml.safe_load(blob.download_as_text(encoding="utf-8"))
dheinz
  • 988
  • 10
  • 16
4

DON'T USE: blob.download_as_string()

USE: blob.download_as_text()


blob.download_as_text() does indeed return a string.

blob.download_as_string() is deprecated and returns a bytes object instead of a string object.

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
2

Works out when reading a docx / text file

    from google.cloud import storage
    # create storage client
    storage_client = storage.Client.from_service_account_json('**PATH OF JSON FILE**')
    bucket = storage_client.get_bucket('**BUCKET NAME**')
    # get bucket data as blob
    blob = bucket.blob('**SPECIFYING THE DOXC FILENAME**')
    downloaded_blob = blob.download_as_string()
    downloaded_blob = downloaded_blob.decode("utf-8") 
    print(downloaded_blob)
Harshal SG
  • 403
  • 3
  • 7