9

I'm looking to work on a SPSS files (.sav) using pandas. In the absence of the SPSS program, here's what a typical file looks like when converted to .csv:

enter image description here

On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Labels, while the second row contains the VarNames.

enter image description here

When I bring the file into pandas thus:

import pandas.rpy.common as com

def savtocsv(filename):
    w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
    w = com.convert_robj(w)
    return w

and then do a head(), the first row (Label) is missing:

enter image description here

How can labels be maintained?

Community
  • 1
  • 1
Pyderman
  • 14,809
  • 13
  • 61
  • 106

1 Answers1

6

Labels in a sav file are stored in variable.labels attribute of the returning object from the read.spss function.

You can get the variable labels with the following:

import pandas.rpy.common as com

def get_labels(filename):
    w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
    w = com.convert_robj(w)
    return w

If you want to set the labels as the column names of your dataframe:

import pandas.rpy.common as com

def savtocsv(filename):
    w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
    cols = list(com.robj.r("attr")(w, "variable.labels"))
    w = com.convert_robj(w)
    w.columns = cols
    return w
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Great, that seems to do what I need, thanks. I guess I can then someone with pandas wedge these in to be the column headers, replacing the varName values. But is it possible do the conversion **and** include the labels in one go (one call to `com.robj.r()`), to save handing to do further manipulating in pandas? – Pyderman Mar 29 '16 at 22:29
  • 1
    It is possible to read the file once and get the attributes on the returning object but it will require another r call I think. Please see the update. – ayhan Mar 29 '16 at 23:25