9

I have two versions of SPSS at work. SPSS 11 running on Windows XP and SPSS 20 running on Linux. Both copies of SPSS work fine. Files created with either version of SPSS open without incident on the other version of SPSS. I.E. - I can create a .sav file with SPSS 20 on Linux and open it on SPSS 11 on Windows without incident.

But, if I create a .sav file with SPSS 20 and import the data into either R or PSPP (on Linux), I get a bunch of warnings. The data appears to import correctly, but I am concerned by the warnings. I do not see any warning when importing a .sav from SPSS 11 or other .sav files I have been sent. Many of the analysts at my company use SPSS so I've gotten SPSS files from different versions of SPSS and I have never before seen this warning. The warning messages are nearly identical between PSPP and R which makes sense. AFAIK, they use the same underlying libs to import the data. This is the R error:

Warning messages:
1: In read.spss("test.sav") :
test.sav: File-indicated value is different from internal value for at least one of  the three system values.  SYSMIS: indicated -1.79769e+308, expected -1.79769e+308; HIGHEST: 1.79769e+308, 1.79769e+308; LOWEST: -1.79769e+308, -1.79769e+308   

2: In read.spss("test.sav") :
test.sav: Unrecognized record type 7, subtype 18 encountered in system file

The .sav file is really simple. It has two columns, dumb and dumber. Both are integers. The first two contains two values of 1.0. The second row contains two values of 2.0. I can provide the file on request (I don't see any way to upload it to SO). If anyone would like to see the actual file, PM me and I'll send it to you.

dumb  dumber
1.0   1.0
2.0   2.0

Thoughts? Anyone know the best way to file a bug against R without getting roasted alive on the mailing list? :-)

EDIT: I used the term "Error" in the title line. I'll leave it, but I should not have used this word. The comments below are correct in pointing out that the messages I am seeing are warnings, not errors. I do however feel that this is made clear in the body of the question above. Clearly, the SPSS data format has changed over time and SPSS/IBM have failed to document these changes which is the root of the problem.

Choens
  • 1,312
  • 2
  • 14
  • 23
  • No real insight, but can echo the sentiment of getting a litany of these warnings every time I import from SPSS into R. If it makes you feel any better, my unscientific manual checks b/t R and SPSS have always shown that the data imported without error. I hope we can get some good insight into this! – Chase Oct 07 '11 at 19:37
  • I'm glad to hear that the data you have seen appears to have imported correctly. My problem is that I can't afford to have errors and dealing with the dates stuff is tricky enough, without wanting to run the risk of any errors because of whatever this warning may be telling us. I can't tell my boss that my cross-tabs are a little off because I used R rather than SPSS. Its too hard to get another job these days. :-) – Choens Oct 07 '11 at 20:14
  • 2
    While I sympathize with your comments about the snarkiness of the R list, I also agree with the other commenters that it's not fair to count this as a bug in R. R is trying as hard as it can, and warning you that something *might* be wrong. I think if you want to try to fix/diagnose this yourself, you're going to have to get very familiar with debugging of C components of R code. Start by tracking down the specific line in the C code (i.e., line 585 of sfm-read.c). Figure out what function it is (read_machine_flt64_info), then do source-level debugging of ... – Ben Bolker Oct 07 '11 at 22:13
  • (to) set a breakpoint in that function and step through it while reading in the relevant file. (I think you need the R extensions manual for this info.) If you're not set up to do this (i.e. have a debugging environment set up and be comfortable with source-level debugging of C code) this is going to be a hard slog. However, I don't see that you have much choice -- you can (1) dig in and try to figure it out yourself [and I do think that if you encounter trouble as you work your way through it that you would encounter a positive reception on the R development list ...]; (2) hire a consultant: – Ben Bolker Oct 07 '11 at 22:17
  • (3) learn to live with the warnings. – Ben Bolker Oct 07 '11 at 22:19
  • In your efforts at identifying the offending subtypes and how they are being detected and flagged you should remember that it is possible to force R to stop execution at a warning and if you have set your debugging apparatus up to drop into the `browser()`, you can then inspect the environment. You should also be able to run `traceback()` to identify the function called when an (upgraded) warning occurs. – IRTFM Oct 07 '11 at 23:44
  • I've changed my mind about suggesting that this woud be potentially useful. In order to be confident that you knew what the record type 7, subtype 18 is supposed to mean you would need to see something from IBM. I registered there and agreed not to distribute anything in hopes they might have documented their file format not that they claim to be encouraging "wider use". Their claims are meaningless. They have not documented either the record layout or the meaning of their types and subtypes. They instead insist that you only use their code and then not distribute it. – IRTFM Oct 08 '11 at 01:18
  • by the way, do you mean "nearly identical between [PSPP] and R" above? – Ben Bolker Oct 08 '11 at 16:25
  • Yes, I meant PSPP and not SPSS in the original post. I have corrected the typo. Thanks for pointing it out. Things make a bit more sense now. – Choens Oct 11 '11 at 16:52
  • Option 3 is clearly the easiest option, but I am uncomfortable with it because of the environment in which I use R. If I lack confidence in the fidelity of the data import, I have to either implement a work around, such as exporting SPSS data to CSV before importing in to R or prove to myself that the warnings have not affected the integrity of the information. I'm getting paid to crunch other people's data. I'd rather use R, but I can't afford for their to be any errors in importing the data and right now I'm just not 100% sure that everything is importing accurately. – Choens Oct 11 '11 at 17:29
  • Possible duplicate of [Read SPSS file into R](http://stackoverflow.com/questions/3136293/read-spss-file-into-r) – Waldir Leoncio Feb 28 '17 at 08:15

2 Answers2

11

It's not an error message. It is only a warning. SPSS refuses to document their file formats so people have not been motivated to track down by reverse engineering the structure of new "subtypes". There is no way to file a bug report without getting roasted because there is no bug .... other than a closed format and that bug complaint should be filed with the owners of SPSS!

EDIT: The R-Core is a volunteer group and takes it responsibilities very seriously. It exerts major efforts to track down anything that affects the stability of systems or produces erroneous calculations. If you were willing to be a bit more respectful of the authors of R and suggest the possibility of collaboration on the R-devel mailing list to identify solutions to this problem without using the term "bug", you would arouse much less hostility. There might be someone who would be willing to see if a simple .sav file such as the one you constructed could be examined under a hexadecimal microscope to identify whatever infinite negative value is being mistaken for another infinite negative value. Most of the R-Core is not in possession of working copies of SPSS.

You could offer this link as an example of the product of others who have attempted the reverse engineering of SPSS .sav formats:

http://svn.opendatafoundation.org/ddidext/org.opendatafoundation.data/references/pspp_source/sfm-read.c

Edit: 4/2015; I have seen a recent addition to the ?read.spss help file that refers one to pkg:memisc: "A different interface also based on the PSPP codebase is available in package memisc: see its help for spss.system.file." I have used that package's function successfully (once) on files created by more recent versions of SPSS.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I understand that the fundamental problem is SPSS' file format. I just want to be absolutely that the data will import without error or if there is a possibility for error, what that is. And, by error, I mean any situation where the data imported may differ from what I would intuitively expect based on what the data looks like in SPSS and the settings used in read.spss(). I'm using R in a corporate environment. Discussing the intricacies of the GPL and why its not R's fault if an analysis goes haywire with my clients is not something I want to do. – Choens Oct 07 '11 at 20:32
  • 1
    My reluctance to discuss this on the R mailing list is because of comments like the ones from DWin and Andrie. Many of the participants on the R mailing list share your attitude, but tend to be less polite about sharing them. I find such elitism to be nearly insufferable. I spread FOSS by using it and showing people that it works, not by standing on a soap box. – Choens Oct 07 '11 at 20:39
  • Darned character limits . . . . The copyright date of spss.c in the foreign package is 2000. Without going through all of the changelogs, it is possible this code hasn't been substantially touched in a long time. I am going to email the owners listed in the copyright file, but they may not be actively developing the code anymore. I don't know anything about reverse engineering binary file formats, but I'm willing to learn/help if there is someone here who could point me in the right direction. – Choens Oct 07 '11 at 20:46
  • Send me the file. It shouldn't be that difficult to unmask me using the Dwin name and the bit of credit in a comment by Gavin Simpson in this SO question; http://stackoverflow.com/questions/6959862/r-going-from-a-data-frame-with-weight-variable-to-a-regular-data-frame . My real address is not hidden on any of the postings I made to R-help: https://stat.ethz.ch/pipermail/r-help/2011-October/thread.html – IRTFM Oct 07 '11 at 21:14
  • Thanks. Not trying to pick a fight, I just got tired of that mailing list a long time ago. I can send examples files to anyone who wants to take a look. I created an identical file using SPSS 11. This one imports without error. I looked at them both in okteta, but I'm a bit out of my league working with binary files. But, having two files should make it easier to ID what has changed. There are some differences but without first going through the import code, I don't have any particular notion of why it is throwing the warning. – Choens Oct 07 '11 at 21:36
  • I think you're probably going to end up having to source-level debug step by step for each file to understand the differences; I don't think staring at the files themselves is going to do it, although http://czep.net/data/spssread/ and http://cvs.savannah.gnu.org/viewvc/*checkout*/pspp/doc/dev/system-file-format.texi?root=pspp&revision=1.2&content-type=text%2Fplain may be useful starting points ... – Ben Bolker Oct 07 '11 at 22:34
1

The SPSS file format is not publicly documented and can change, but IBM SPSS does provide free libraries that can read and write the SAV file format. These mask any changes to the format. You can get them from the SPSS Community website (along with many other free goodies including the SPSS integration with R). Go to www.ibm.com/developerworks/spssdevcentral and look around. BTW, there have been substantial additions/changes to the sav file since year 2000, although the core data can still be read by old versions.

HTH, Jon Peck

JKP
  • 5,419
  • 13
  • 5