0

I have got an application/octect-stream with a application/zip as body in requests.Response object returned from an api call with a csv file inside it. I am trying to read the csv file to pandas without writing to the disk, if possible. And if I want to write the zip file to a path as a zip file, how can I do that?

resp = requests.get(url, headers=headers)
resp.raise_for_status()
csv_obj = zlib.decompress(resp.content, wbits=zlib.MAX_WBITS|32)
print(type(csv_obj))
export_file = pd.read_csv(csv_obj)
export_file.to_csv('./Test_export.csv')
martineau
  • 119,623
  • 25
  • 170
  • 301
Jihjohn
  • 398
  • 4
  • 19

2 Answers2

1

Updated version

# step 1: it turns out pandas can read zipped csv files even from urls!
some_dataframe = pandas.read_csv(url)

If pandas can't figure it out by itself there are some parameters you can try to massage.

# step 1: it turns out pandas can read zipped csv files even from urls!
some_dataframe = pandas.read_csv(zip_filename, compression='zip', header=0) # etc..

Previous version

I will leave the previous version of my answer below for reference.


# step 1: downloading the zip file
zip_filename = 'response.zip'
with open(zip_filename, 'wb') as zip_file:
  for chunk in response.iter_content(chunk_size=255): 
      if chunk:
        zip_file.write(chunk)

# step 2: turns out pandas can read zipped csv files!
some_dataframe = pandas.read_csv(zip_filename)
Maurits
  • 21
  • 1
  • 2
  • Thank you so much Maurits. Could you please explain the `chunk_size=255` and why is there a `if` inside the `for` loop? – Jihjohn Jan 31 '22 at 12:43
  • The general concept when reading streams is that we do not want to read the entire stream into memory at once. If working with small files only you could set `chunk_size=None` or just omit the parameter. The `if` is interesting; the guy who I got the `iter_content` snippet from noted that it was to avoid corrupting the bytestream due to heartbeat (keep-alive) requests. [I cannot really confirm nor deny glancing at the code.](https://2.python-requests.org/en/master/_modules/requests/models/#Response.iter_content) It may be safe to omit. I leave it in because I'm paranoid. – Maurits Jan 31 '22 at 13:23
  • I was asking about 255 the number selection. If really looks interesting. Thank you! – Jihjohn Jan 31 '22 at 16:04
  • Hey Maurits. The updated answer works perfectly fine when we are just calling a url. I have an api call with headers. I have checked how to add url headers in pandas, but couldn't figure out how to add headers – Jihjohn Feb 02 '22 at 07:45
  • My idea would be to use the requests module to add headers to your request then plug the response content into pandas. This could help with that: https://stackoverflow.com/questions/39213597 – Maurits Feb 02 '22 at 08:19
  • Plugging content into read_csv didn't work for me and the answer you linked looks similar to my solution. I used zipfile extra – Jihjohn Feb 02 '22 at 08:30
  • Thank you so much for putting in the effort. – Jihjohn Feb 02 '22 at 10:31
0
import pandas as pd
import io
import zipfile

resp = requests.get(url, headers=headers, stream=True)
resp.raise_for_status()
zfile = zipfile.ZipFile(io.BytesIO(resp.content))
# I only had one file, so calling zfile.namelist
export_file = pd.read_csv(zfile.open(f'{zfile.namelist()[-1]}'))
Jihjohn
  • 398
  • 4
  • 19