1

I am trying to convert a big ~2GB SPSS (.SAV) file into CSV using Python.

If there was a file which size < 500MB, there is no problem doing the following:

import pandas as pd
df = pd.read_spss('stdFile.sav')
df.to_csv("stdFile.csv", encoding = "utf-8-sig")

but in this case, i got a MemoryError...

Iam looking forward solutions, not necessarily in Python. But I don't have a SPSS license, so I must transform the file with another tool.

  • can you share some lines from you `sav file ` ? – Chiheb Nexus May 22 '20 at 23:52
  • This may be helpful: https://www.youtube.com/watch?v=yPaPZDg6JAA [YouTube - Convert SPSS (.sav) to Text (.csv)] – drkstr101 May 23 '20 at 01:21
  • If you're comfortable with `R`, you can use the packages `haven` or `foreign` to read in the .sav file, and then you can use `base R` or the `xlsx` package to write out a .csv – Matt May 23 '20 at 04:11
  • I just tried your options, using R: library(foreign) write.table(read.spss("inputfile.sav"), file="outputfile.csv", quote = FALSE, sep = ",") The problem now, is that i get a lot of white spaces! The original file was a .SAV(2GB) and the result is a .CSV(6GB) I must read a little more about R and then report again, it's my first time using R-Gui. But anyway, i am now able to clear the CSV with python working with chunks. But i will give R another try – Manuel Quintana May 24 '20 at 01:01

2 Answers2

2

You can use python's pyreadstat package to read the spss file in chunks, and save each chunk to the csv:

import pyreadstat
fpath = "path/to/stdFile.sav"
outpath = "stdFile.csv"
# chunksize determines how many rows to be read per chunk
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sav, fpath, chunksize= 10000)

cnt = 0
for df, meta in reader:
    # if on the first iteration write otherwise append
    if cnt>0:
        wmode = "a"
        header = False
    else:
        wmode = "w"
        header = True
    # write
    df.to_csv(outpath, mode=wmode, header=header)
    cnt+=1


more information here: https://github.com/Roche/pyreadstat#reading-rows-in-chunks

Otto Fajardo
  • 3,037
  • 1
  • 18
  • 26
0

First import module savReaderWriter to convert .sav file into structured array then import module numpy to convert structured array into csv:

pip install savReaderWriter

savReaderWriter

import savReaderWriter 
import numpy as np

reader_np = savReaderWriter.SavReaderNp("stdFile.sav")
array = reader_np.to_structured_array("outfile.dat") 
np.savetxt("stdFile.csv", array, delimiter=",")
reader_np.close()
Mahsa Hassankashi
  • 2,086
  • 1
  • 15
  • 25