3

I have a zipfile on my Google Drive. In that zipfile is a XML file, which I want to parse, extract a specific information and save this information on my local computer (or wherever).

My goal is to use Python & Google Drive API (with help of PyDrive) to achieve this. The workflow could be as follows:

  1. Connect to my Google Drive via Google Drive API (PyDrive)
  2. Get my zipfile id
  3. Load my zipfile to memory
  4. Unzip, obtain the XML file
  5. Parse the XML, extract the desired information
  6. Save it as a csv on my local computer

Right now, I am able to do steps 1,2,4,5,6. But I dont know how to load the zipfile into memory without writing it on my local HDD first.

Following PyDrive code will obtain the zipfile and place it on my local HDD, which is not exactly what I want.

toUnzip = drive.CreateFile({'id':'MY_FILE_ID'})
toUnzip.GetContentFile('zipstuff.zip')

I guess one solution could be as follows:

I could read the zipfile as a string with some encoding:

toUnzip = drive.CreateFile({'id':'MY_FILE_ID'})
zipAsString = toUnzip.GetContentString(encoding='??')

and then, I could somehow (no idea how, perhaps StringIO could be useful) read this string with Python zipfile library. Is this solution even possible? Is there a better way?

mLC
  • 663
  • 10
  • 22

2 Answers2

2

You could try StringIO, they emulate files but reside in memory.

Here is the code from a related SO post:

# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'

from StringIO import StringIO
zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()

# output: "hey, foo" 

or using a URL:

url = urlopen("http://www.test.com/file.zip")
zipfile = ZipFile(StringIO(url.read()))

Hope this helps.

Community
  • 1
  • 1
Mr.Rebot
  • 6,703
  • 2
  • 16
  • 91
  • thanks a lot for the answer and also an inspiration. I eventually solved with BytesIO and special encoding. – mLC Mar 23 '17 at 10:25
1

Eventually, I solved it using BytesIOand cp862 encoding:

toUnzipStringContent = toUnzip.GetContentString(encoding='cp862')
toUnzipBytesContent = BytesIO(toUnzipStringContent.encode('cp862'))
readZipfile = zipfile.ZipFile(toUnzipBytesContent, "r")
mLC
  • 663
  • 10
  • 22