I'm having quite a lot of trouble trying to decompress gzipped data in django. I've tried a number of the solutions proposed in Download and decompress gzipped file in memory? but i think i'm running into difficulty in how it interacts with Django
I'd like to be able to upload data.csv.gz
and then if it is a gzip, extract out the compressed data into a django File
to continue along its routine (Saving to FileField
)
What I have so far in my serializer
def create(self, validated_data):
file: File = validated_data.get("file")
ext = file.name.split(".")[-1].lower()
if ext == "gz":
compressedFile = io.BytesIO()
compressedFile.write(file.read())
decompressed_fname = file.name[:-3]
decompressedFile = gzip.GzipFile(fileobj=compressedFile)
with open(decompressed_fname, "wb") as outfile:
outfile.write(decompressedFile.read())
with open(decompressed_fname, "rb") as outfile:
file = File(outfile)
ext = decompressed_fname.split(".")[-1].lower()
...
When I do this, outfile is empty when I check its contents on disk, and throws an error in later routines
f.seek(0)
ValueError: seek of closed file
I get a similar error if I use shutil instead too
if ext == "gz":
compressedFile = io.BytesIO()
compressedFile.write(file.read())
decompressed_fname = file.name[:-3]
import shutil
shutil.copyfileobj(gzip.GzipFile(fileobj=file), open(decompressed_fname, "wb"))
with open(decompressed_fname, "rb") as outfile:
file = File(outfile)
ext = decompressed_fname.split(".")[-1].lower()
the curl command i'm using:
curl http://0.0.0.0:8000/upload/ -X 'POST' -H "Content-Encoding: gzip" -F "input_type=data" -F "file=@data.csv.gz"