4

I am importing a huge sas dataset of about 7 GB in Anaconda Spyder (Python 3.5) using pandas.read_sas. The code is something like as below:

import pandas as pd
hugedata = pd.read_sas('K:/HugeData.sas7bdat')

but I received the following error:

Traceback (most recent call last):

  File "<ipython-input-46-31acb10b0e92>", line 1, in <module>
    hugedata = pd.read_sas('K:/ERA/Credit Risk Estimates/PRAM/NW_RM_SUB_FCLY_M_HIST.sas7bdat')

  File "C:\Users\l086276\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\sas\sasreader.py", line 61, in read_sas
    return reader.read()

  File "C:\Users\l086276\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\sas\sas7bdat.py", line 579, in read
    nd = (self.column_types == b'd').sum()

AttributeError: 'bool' object has no attribute 'sum'

Just wondering why internal call to sas7bdat.py function is generating error on importing this dataset while its working absolutely fine with other sas datasets. What could go wrong with this dataset. Need help please.

FarrukhJ
  • 139
  • 1
  • 8
  • If you have SAS try to rewrite part of the table and run padas again. I have a table created using SAS 9.1 and pandas isn't able to read this table. But I had a different error. – Robert Soszyński Aug 12 '16 at 08:08
  • Table comprises of 250 fields with 70 million records. I need to have the whole table for my analysis. I rerun and regenrated the table but still it is the same problem I am facing. Table resides in Teradata server but it is unconvinient to connect my code to the server due to table size and server speed issues. – FarrukhJ Aug 14 '16 at 23:00

1 Answers1

3

I've found that the sas7bdat package works where the pandas fails with the above message.

from sas7bdat import SAS7BDAT

def load_sas(sasfile,
             encoding="utf8",
             encoding_errors="replace"):

    with SAS7BDAT(sasfile, encoding=encoding,encoding_errors=encoding_errors) as sas:
        sas = iter(sas)
        columns = [c for c in next(sas)]
        df = pd.DataFrame(sas, columns=columns)
        return df
user48956
  • 14,850
  • 19
  • 93
  • 154
  • 1
    Note this is for Python2. For Python3, you need to change .next() to .__next__() https://stackoverflow.com/questions/1073396/is-generator-next-visible-in-python-3-0 – wordsforthewise Aug 27 '17 at 20:19
  • If this is the recommended way of doing this, it seems a little broken. If always understood __ as 'private' in Python, therefore you should call it directly and should expect it to fail in some cases. https://stackoverflow.com/questions/1301346/what-is-the-meaning-of-a-single-and-a-double-underscore-before-an-object-name – user48956 Oct 02 '17 at 17:53
  • Some of the answers further down on that page say things like "this is just a convention, a way for the Python system to use names that won't conflict with user names." and "is typically reserved for builtin methods or variables", so when would using .__next__() fail? I use .__file__ all the time to find out where the module code is located, and I think .__version__ tells you the version. – wordsforthewise Oct 02 '17 at 23:10
  • Wish I could edit that first comment. Looking at the answer I linked, it seems like Python3 should use next(sas) instead of sas.__next__() – wordsforthewise Oct 02 '17 at 23:11