2

I want to read a .dta dataset using pandas read_stata():

import pandas as pd
df=pd.read_stata('data_chunk1.dta')

But I get an error about unpack buffer:

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-5-8e0085ff8186> in <module>
----> 1 df=pd.read_stata('data_chunk10.dta')
      2 df

~/anaconda3/lib/python3.8/site-packages/pandas/io/stata.py in read_stata(filepath_or_buffer, convert_dates, convert_categoricals, index_col, convert_missing, preserve_dtypes, columns, order_categoricals, chunksize, iterator)
   1926 
   1927     try:
-> 1928         data = reader.read()
   1929     finally:
   1930         reader.close()

~/anaconda3/lib/python3.8/site-packages/pandas/io/stata.py in read(self, nrows, convert_dates, convert_categoricals, index_col, convert_missing, preserve_dtypes, columns, order_categoricals)
   1616 
   1617         if convert_categoricals:
-> 1618             self._read_value_labels()
   1619 
   1620         if len(data) == 0:

~/anaconda3/lib/python3.8/site-packages/pandas/io/stata.py in _read_value_labels(self)
   1468             self.path_or_buf.read(3)  # padding
   1469 
-> 1470             n = struct.unpack(self.byteorder + "I", self.path_or_buf.read(4))[0]
   1471             txtlen = struct.unpack(self.byteorder + "I", self.path_or_buf.read(4))[0]
   1472             off = np.frombuffer(

error: unpack requires a buffer of 4 bytes

I have previously been able to read the file like this in Google Collab (so the problem is not about the data file itself), but currently I get this error when I am trying to run the code on my local PC.

Can you please tell me what I am doing wrong?

  • What version of pandas are you using? It may be different from the one on Colab. You can run `pd.__version__` to check. – AlexK May 12 '21 at 04:53
  • it is pandas 1.1.3 on my PC, but 1.1.5 on collab. Is this a problem? – Ramin Forouzandeh May 12 '21 at 04:55
  • I actually went on and upgraded both to 1.2.4, but the problem still exists. – Ramin Forouzandeh May 12 '21 at 04:59
  • 1
    Not an expert here, but it is very likely that this is caused by the file itself. pandas calls modules from the Python standard library (struct and io) for reading the file, and it is likely it has difficulties creating a binary representation of some of the file's contents. If you have not already, I would try to read a file that you were previously able to open to help you isolate the problem. – AlexK May 12 '21 at 05:40
  • It seems like you are right. I was able to open some other files. But still I can't understand what is wrong with my data. Here I attach the file I am trying to read: https://drive.google.com/file/d/1bkpPHPVbZ2XuLqFryGclpuYDClPJUbyV/view?usp=sharing – Ramin Forouzandeh May 12 '21 at 15:30

0 Answers0