1

Version of given Stata file is 44. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

import pandas as pd
Citations2 = pd.io.stata.read_stata('Citations_2000-2010 part 2.dta')

I want to convert this file into csv.

1 Answers1

1

Install pyreadstat

# pip install pyreadstat
import pyreadstat

df, meta = pyreadstat.read_dta('Citations_2000-2010 part 2.dta')

df.to_csv('Citations_2000-2010 part 2.csv', index=None)

Details:

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13569764 entries, 0 to 13569763
Data columns (total 8 columns):
 #   Column       Dtype  
---  ------       -----  
 0   patent       int64  
 1   citation     float64
 2   cit_date     object 
 3   cit_name     object 
 4   cit_kind     object 
 5   cit_country  object 
 6   category     object 
 7   citseq       object 
dtypes: float64(1), int64(1), object(6)
memory usage: 828.2+ MB
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • It is giving ReadstatError: This version of the file format is not supported – 1729-Shivam-Maurya Apr 18 '23 at 07:19
  • I used your public link. What is your `pyreadstat` version? `print(pyreadstat.__version__)`. Here 1.2.1 (and Pandas 1.5.3) – Corralien Apr 18 '23 at 07:23
  • My pyreadstat version is 1.2.1 and Pandas is 1.4.4 – 1729-Shivam-Maurya Apr 18 '23 at 07:25
  • Would you like me to convert it for you and send it back to you? – Corralien Apr 18 '23 at 07:26
  • Yes. It will help me – 1729-Shivam-Maurya Apr 18 '23 at 07:26
  • You can download the csv file [here](https://wetransfer.com/downloads/957f39e9387924ac2057b78f0355fba820230418073103/648df425cabdf376ec661d475a91f24b20230418073117/5c872e?trk=TRN_TDL_01&utm_campaign=TRN_TDL_01&utm_medium=email&utm_source=sendgrid). To convert the file, I used the code above without any manipulation and just compress the data with zip. There are 13,569,764 records. – Corralien Apr 18 '23 at 07:33
  • Thanks a lot. I have to convert other files also. Are you able to think, what will be probable reason? – 1729-Shivam-Maurya Apr 18 '23 at 07:59
  • Here is the link of two other files [link]https://drive.google.com/file/d/1jThS9yZYoo7aPvaqjaw7r6JAlWSa8vyE/view?usp=sharing https://drive.google.com/file/d/1bowX7ofl33UxJN74QhC0A0ZbbDeLyZgU/view?usp=sharing – 1729-Shivam-Maurya Apr 18 '23 at 08:11
  • I don't understand, I can read all files with `pd.read_stata`. If I open binary files, I see the version is 118. Converted files [here](https://wetransfer.com/downloads/37ca151331d0744fbf5b7b0473ae2d2520230418082447/fea993676ae30c6323761531d1a573e820230418082512/1b14d4?trk=TRN_TDL_01&utm_campaign=TRN_TDL_01&utm_medium=email&utm_source=sendgrid) – Corralien Apr 18 '23 at 08:28