0

I'm trying to open a .gz file (it is just .gz, not .log.gz or anything else), but reading the file line by line is only displaying gibberish. It should be having sensor data.

This is my code:

import gzip

with gzip.open('data.gz','r') as fin:        
    for line in fin:        
        print('got line', line)

Here is an example of the output:

b'\x80\x04\x95\x90\x01\x00\x00\x00\x00\x00\x00\x8c\x1bsklearn.preprocessing._data\x94\x8c\x0eStandardScaler\x94\x93\x94)\x81\x94}\x94(\x8c\twith_mean\x94\x88\x8c\x08with_std\x94\x88\x8c\x04copy\x94\x88\x8c\x0en_features_in_\x94K\x08\x8c\x0fn_samples_seen_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x06scalar\x94\x93\x94\x8c\x05numpy\x94\x8c\x05dtype\x94\x93\x94\x8c\x02i8\x94K\x00K\x01\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bC\x08\xae\x0b\x00\x00\x00\x00\x00\x00\x94\x86\x94R\x94\x8c\x05mean_\x94\x8c\x13joblib.numpy_pickle\x94\x8c\x11NumpyArrayWrapper\x94\x93\x94)\x81\x94}\x94(\x8c\x08subclass\x94h\r\x8c\x07ndarray\x94\x93\x94\x8c\x05shape\x94K\x08\x85\x94\x8c\x05order\x94\x8c\x01C\x94\x8c\x05dtype\x94h\x0f\x8c\x02f8\x94K\x00K\x01\x87\x94R\x94(K\x03h\x13NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x8c\n'
b'allow_mmap\x94\x88ub*\xd4\xff\xe6\xbd\rE@\x15\xd1=A5\xdfg@\x0f\x7fv\x94\xabjg@\x04?\xf0\x03?0^@v\x94\xab\xda\xb5\xb6E@\x9bz\xec\xdd\x90\x9e?@\x13\xd7\xa8J\x94\x92;@*\x1f\xc6\xfd\xba$7@\x95&\x00\x00\x00\x00\x00\x00\x00\x8c\x04var_\x94h\x1b)\x81\x94}\x94(h\x1eh h!K\x08\x85\x94h#h$h%h(h*\x88ub\x8f\x16F\x8b\xed\n'
b'T@\x01^\xd6u\xfd\x94\xa9@Z\xae\xc2d\xb0\x05\x9a@\xe7\xbb\xeeu.$\x85@~\xd7\xe2\xe7_@F@\xb0$HdS\x17F@*\xa8\x07o\x16]}@\x1e\xba\xc0\xc8r]<@\x95(\x00\x00\x00\x00\x00\x00\x00\x8c\x06scale_\x94h\x1b)\x81\x94}\x94(h\x1eh h!K\x08\x85\x94h#h$h%h(h*\x88ub\x8fDh\n'
b'Z\xe8!@E\xaa1\xf0\x91\x9cL@Q\xecq~\xa0gD@\x13m$\x9e\x92\x02:@\xe9>L\x19(\xaf\x1a@DZ\xf9\xbd\x7f\x96\x1a@~\x014y\xdf\xac5@y\xa3\xc0X\xb4M\x15@\x95\x1f\x00\x00\x00\x00\x00\x00\x00\x8c\x10_sklearn_version\x94\x8c\x060.24.1\x94ub.'
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • `gz` is a type of zipped file (gzip). You will have to [gunzip it to read it](https://docs.python.org/3/library/gzip.html). – JNevill Jan 03 '22 at 21:41
  • [`gzip.open()`](https://docs.python.org/3/library/gzip.html#gzip.open)? – Olvin Roght Jan 03 '22 at 21:42
  • 2
    Does this answer your question? [python: read lines from compressed text files](https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files) – Andrei Miculiță Jan 03 '22 at 21:46
  • How are you "trying to open" ? Can you post a [example] of your script / code ? – hc_dev Jan 04 '22 at 00:28
  • 1
    We'd need to inspect the data to reproduce this. The most obvious-on-its-face explanation is that the data file doesn't really include gzipped text. Can you build a [mre] that works with known-good data (say, that includes instructions for how to create your data file with an obviously-correct process like `printf '%s\n' "first line" "second line" | gzip >data.gz`)? – Charles Duffy Jan 04 '22 at 21:19
  • 1
    See https://replit.com/@CharlesDuffy2/OblongScaredClient#main.sh -- the code works fine when we create a known-good input file – Charles Duffy Jan 04 '22 at 21:22
  • You forgot to call the decode function. Here an [example](https://stackoverflow.com/a/58693033/17766295) – Nabil Jan 04 '22 at 21:42
  • @Nabil, if the OP was just saying they got `b'something'` instead of `something`, that's hardly "gibberish" as the question describes. I doubt very much that it's just a problem of failing to convert between Unicode and bytes (or, rather, if it is this is a very poorly-asked question). – Charles Duffy Jan 04 '22 at 21:49
  • @AlwaysWonder, ...keep in mind that anything can be gzipped -- that includes binary data, so unless you have reason to know that your sensor data is text instead of packed binary data, there's no particular reason to _expect_ it to be human-readable after the compression is undone. – Charles Duffy Jan 04 '22 at 22:28
  • Much as a foreign language you do not know would appear to be gibberish. – Mark Adler Jan 04 '22 at 22:58

1 Answers1

0

You need to unpickle the result of the decompression to reconstruct the saved data.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158