I am trying to import a large dataset from Stata 13 into pandas using StataReader. This worked fine with pandas 0.13.1, but after I upgraded to 0.14.1, the ability to read .dta files seems to have drastically worsened. Does anybody know what has happened (I could not find any changes to StataReader in the "What's New" section of the pandas website), and/or how to get around this?
Steps to reproduce my issue:
Create a large dataset in Stata 13:
clear set obs 11500 forvalues i = 1/8000{ gen var`i' = 1 } saveold bigdataset, replace
Try to read it into pandas using StataReader:
from pandas.io.stata import StataReader reader = StataReader('bigdataset.dta') data = reader.data()
Using pandas 0.13.1, this takes around 220 seconds, which is acceptable, but using pandas 0.14.1, nothing has happened even after waiting around 20 minutes.
When I test this issue with a smaller dataset:
Create a smaller dataset in Stata 13:
clear set obs 11500 forvalues i = 1/1000{ gen var`i' = 1 } saveold smalldataset, replace
Try to read it into pandas using StataReader:
from pandas.io.stata import StataReader reader = StataReader('smalldataset.dta') data = reader.data()
Using pandas 0.13.1, this takes around 20 seconds, but using pandas 0.14.1, this takes around 300 seconds.
I would really like to upgrade to the new pandas version and work with my data, which is around the size of bigdataset.dta. Does anybody know a way I could efficiently import my data?