5

I am trying to open a .csv compressed to a .lzma file in Linux using the following code:

import lzma
import pandas as pd

myfile= '/home/stacey/work/roll_158_oe_2018-03-02/BBG.XTKS.8219.S/inst.BBG.XTKS.8219.S.csv.lzma'

with lzma.open(myfile,'rt') as f:
   pair_info=pd.read_csv(f,engine='c',header=0,index_col=0)

Where myfile is a path that exists in Linux.

However I get the error:

with lzma.open(stock,'rt') as f:
AttributeError: 'module' object has no attribute 'open'

I have tried adding the following:

import lzma
import pandas as pd

    myfile= '/home/stacey/work/roll_158_oe_2018-03-02/BBG.XTKS.8219.S/inst.BBG.XTKS.8219.S.csv.lzma'

    with open(myfile) as compressed:
         with lzma.LZMAFile(compressed,'r') as uncompressed:
             line in uncompressed:
             print(line)  

but I get the error:

    with lzma.LZMAFile(compressed,'r') as uncompressed:
TypeError: coercing to Unicode: need string or buffer, file found

I have also tried:

import pandas as pd
import lzma
import pickle

myfile= '/home/stacey/work/roll_158_oe_2018-03-02/BBG.XTKS.8219.S/inst.BBG.XTKS.8219.S.csv.lzma'

myoutput = pickle_load(myfile,'lzma')
print(myoutput )

def pickle_load(filePath,compression=None):
    open_cmd=open if compression is None else __import__(compression).open
    with open_cmd(filePath,'r') as f:
        output=pickle.load(f)
    return output

But again I get the error:

open_cmd=open if compression is None else __import__(compression).open
AttributeError: 'module' object has no attribute 'open'

When I run python-v on the cmd line I get the output:

[scoleman@ip-192-168-9-132 port_1m]$ python -v

# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /usr/lib64/python2.7/site.pyc matches /usr/lib64/python2.7/site.py
import site # precompiled from /usr/lib64/python2.7/site.pyc
# /usr/lib64/python2.7/os.pyc matches /usr/lib64/python2.7/os.py
import os # precompiled from /usr/lib64/python2.7/os.pyc
import errno # builtin
import posix # builtin
# /usr/lib64/python2.7/posixpath.pyc matches /usr/lib64/python2.7/posixpath.py
import posixpath # precompiled from /usr/lib64/python2.7/posixpath.pyc
# /usr/lib64/python2.7/stat.pyc matches /usr/lib64/python2.7/stat.py
import stat # precompiled from /usr/lib64/python2.7/stat.pyc
# /usr/lib64/python2.7/genericpath.pyc matches /usr/lib64/python2.7/genericpath.                              py
import genericpath # precompiled from /usr/lib64/python2.7/genericpath.pyc
# /usr/lib64/python2.7/warnings.pyc matches /usr/lib64/python2.7/warnings.py
import warnings # precompiled from /usr/lib64/python2.7/warnings.pyc
# /usr/lib64/python2.7/linecache.pyc matches /usr/lib64/python2.7/linecache.py
import linecache # precompiled from /usr/lib64/python2.7/linecache.pyc
# /usr/lib64/python2.7/types.pyc matches /usr/lib64/python2.7/types.py
import types # precompiled from /usr/lib64/python2.7/types.pyc
# /usr/lib64/python2.7/UserDict.pyc matches /usr/lib64/python2.7/UserDict.py
import UserDict # precompiled from /usr/lib64/python2.7/UserDict.pyc
# /usr/lib64/python2.7/_abcoll.pyc matches /usr/lib64/python2.7/_abcoll.py
import _abcoll # precompiled from /usr/lib64/python2.7/_abcoll.pyc
# /usr/lib64/python2.7/abc.pyc matches /usr/lib64/python2.7/abc.py
import abc # precompiled from /usr/lib64/python2.7/abc.pyc
# /usr/lib64/python2.7/_weakrefset.pyc matches /usr/lib64/python2.7/_weakrefset.                              py
import _weakrefset # precompiled from /usr/lib64/python2.7/_weakrefset.pyc
import _weakref # builtin
# /usr/lib64/python2.7/copy_reg.pyc matches /usr/lib64/python2.7/copy_reg.py
import copy_reg # precompiled from /usr/lib64/python2.7/copy_reg.pyc
# /usr/lib64/python2.7/traceback.pyc matches /usr/lib64/python2.7/traceback.py
import traceback # precompiled from /usr/lib64/python2.7/traceback.pyc
# /usr/lib64/python2.7/sysconfig.pyc matches /usr/lib64/python2.7/sysconfig.py
import sysconfig # precompiled from /usr/lib64/python2.7/sysconfig.pyc
# /usr/lib64/python2.7/re.pyc matches /usr/lib64/python2.7/re.py
import re # precompiled from /usr/lib64/python2.7/re.pyc
# /usr/lib64/python2.7/sre_compile.pyc matches /usr/lib64/python2.7/sre_compile.                              py
import sre_compile # precompiled from /usr/lib64/python2.7/sre_compile.pyc
import _sre # builtin
# /usr/lib64/python2.7/sre_parse.pyc matches /usr/lib64/python2.7/sre_parse.py
import sre_parse # precompiled from /usr/lib64/python2.7/sre_parse.pyc
# /usr/lib64/python2.7/sre_constants.pyc matches /usr/lib64/python2.7/sre_consta                              nts.py
import sre_constants # precompiled from /usr/lib64/python2.7/sre_constants.pyc
dlopen("/usr/lib64/python2.7/lib-dynload/_localemodule.so", 2);
import _locale # dynamically loaded from /usr/lib64/python2.7/lib-dynload/_local                              emodule.so
# /usr/lib64/python2.7/_sysconfigdata.pyc matches /usr/lib64/python2.7/_sysconfi                              gdata.py
import _sysconfigdata # precompiled from /usr/lib64/python2.7/_sysconfigdata.pyc
import encodings # directory /usr/lib64/python2.7/encodings
# /usr/lib64/python2.7/encodings/__init__.pyc matches /usr/lib64/python2.7/encod                              ings/__init__.py
import encodings # precompiled from /usr/lib64/python2.7/encodings/__init__.pyc
# /usr/lib64/python2.7/codecs.pyc matches /usr/lib64/python2.7/codecs.py
import codecs # precompiled from /usr/lib64/python2.7/codecs.pyc
import _codecs # builtin
# /usr/lib64/python2.7/encodings/aliases.pyc matches /usr/lib64/python2.7/encodi                              ngs/aliases.py
import encodings.aliases # precompiled from /usr/lib64/python2.7/encodings/alias                              es.pyc
# /usr/lib64/python2.7/encodings/utf_8.pyc matches /usr/lib64/python2.7/encoding                              s/utf_8.py
import encodings.utf_8 # precompiled from /usr/lib64/python2.7/encodings/utf_8.p                              yc
Python 2.7.12 (default, Sep  1 2016, 22:14:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
dlopen("/usr/lib64/python2.7/lib-dynload/readline.so", 2);
import readline # dynamically loaded from /usr/lib64/python2.7/lib-dynload/readl                              ine.so

When I then run import lzma I get the output:

>>> import lzma
dlopen("/usr/lib64/python2.7/dist-packages/lzma.so", 2);
import lzma # dynamically loaded from /usr/lib64/python2.7/dist-packages/lzma.so

What have I done wrong and how I can get this working? I've had a look around but can't see any other solution.

halfer
  • 19,824
  • 17
  • 99
  • 186
Stacey
  • 4,825
  • 17
  • 58
  • 99

2 Answers2

3

Apparently you need to call a class from the lzma module to open the file:

import lzma  # python 3, try lzmaffi in python 2
with open('one-csv-file.xz') as compressed:
    with lzma.LZMAFile(compressed) as uncompressed:
        for line in uncompressed:
            do_stuff_with(line)

Extracted from How to open and read LZMA file in-memory

Luca Bezerra
  • 1,160
  • 1
  • 12
  • 23
  • [The docs](https://docs.python.org/3/library/lzma.html#reading-and-writing-compressed-files) list it as a method – roganjosh May 11 '19 at 11:27
  • Thanks @Luca Bezerra. I've tried to implement yur soluton (please see the edit to the quesion) but I get a different error: with lzma.LZMAFile(compressed,'r') as uncompressed: TypeError: coercing to Unicode: need string or buffer, file found Could you take a look please – Stacey May 11 '19 at 12:14
  • @Stacey 1. What version of python are you using? 2. Do you know if the imported `lzma` is the correct one? Try `python -v` and then `import lzma` and check if the path corresponds to what you're expecting. There might be another package or version of lzma installed in your PATH. – Luca Bezerra May 12 '19 at 04:51
  • @LucaBezerra thanks, I'm running Python 2.7.12. I can import lzma but not sure how to tell what version I have (lzma.__version__) doesn't seen to work for me. Doe that help? – Stacey May 12 '19 at 09:18
  • @Stacey From [the docs](https://docs.python.org/3/library/lzma.html#reading-and-writing-compressed-files), it seems like `lzma` was included in Python on version 3.3, so it's likely you're importing an old version of it, which might work differently. Is it possible for you to use Python 3 instead? – Luca Bezerra May 13 '19 at 02:54
  • @LucaBezerra sorry for the confusion I just ran: >>> from platform import python_version >>> print(python_version()) in my console and its telling me I'm running version 3.6.2. (not sure why its telling me 2.7.12 from the cmd line. Still have the no attribute issue. – Stacey May 13 '19 at 05:02
  • @Stacey When you do `python -v` and then `import lzma`, do you get something like `/usr/lib/python3.6/__pycache__/lzma.cpython-36.pyc matches /usr/lib/python3.6/lzma.py`? – Luca Bezerra May 13 '19 at 16:01
  • Hi @LucaBezerra I've added the output when I run python -v and import lzma to the main question body. – Stacey May 13 '19 at 17:55
  • 1
    @Stacey What command are you using to run your script? The updated output you pasted seems to be using Python 2 as the default version when you run `python `. Are you running your script inside a virtual env of some sort? If not, can you try running `python3 .py`? – Luca Bezerra May 13 '19 at 18:38
1

There are some differences in lzma module between Python 2.7.x and Python 3.3+.

Python 2.7.x doesn't have lzma.open and lzma.LZMAFile doesn't take file-like object. Here's a function to open an lzma file Python version independent way.

def open_lzma_file(f, *args, **kwargs):
    import os
    try:
        import lzma
    except ImportError:
        raise NotImplementedError('''This version of python doesn't have "lzma" module''')
    if hasattr(lzma, 'open'):
        # Python 3.3+
        # lzma.open supports 'str', 'bytes' and file-like object
        return lzma.open(f, *args, **kwargs)
    # Python 2.7.x
    # This version has LZMAFile 
    # LZMAFile doesn't take-file like object in Python 2.7
    if not isinstance(f, basestring):
        # probably a file like object
        if hasattr(f, 'name') and os.path.exists(f.name):
            f = f.name
        else:
            raise TypeError('Expected `str`, `bytes`, `unicode` or file-like object with valid `name` attribute pointing to a valid path')
    return lzma.LZMAFile(f, *args, **kwargs)

Usage;
Simply pass a str , bytes or file-like object.

with open_lzma_file(myfile,'rt') as f:
   pair_info=pd.read_csv(f,engine='c',header=0,index_col=0)
Nizam Mohamed
  • 8,751
  • 24
  • 32