5

I have a 60gb sas7bdat file I am trying to subset in R. Does anyone know a method to do this?

I have SAS and it takes about 14 minutes to complete, but I want a way to do this in R.

thank you

Joe
  • 62,789
  • 6
  • 49
  • 67
user295944
  • 273
  • 4
  • 17
  • sas 9.4 is actually pretty fast with data that large – rawr Mar 06 '14 at 01:57
  • 4
    Yeah, I was hoping to find a way around SAS – user295944 Mar 06 '14 at 02:57
  • `Hmisc` has functions for working with sas7bdat files. not sure how helpful that will be with a 60gig file. I don't think `data.table` has sas support. Maybe write it to a csv or text file and `fread` it – rawr Mar 06 '14 at 04:35
  • I just uploaded a packages and saw: http://cran.r-project.org/web/packages/sas7bdat/index.html I haven't looked it over but it looks promising. – Tyler Rinker Mar 06 '14 at 05:54
  • Are you sure you need the whole file? Not just selected variables? If it is just the subset operation that you need, doing it in SAS would be much easier than working around R memory limit issues. – Maxim.K Mar 06 '14 at 10:41
  • In actuality I was trying to prove to my wife that one can get around using SAS, I guess she and SAS won this battle. – user295944 Mar 06 '14 at 15:18
  • Seems like the package `haven` would be worth a look, based on the answer here: http://stackoverflow.com/a/30043226/143319 Not sure how big that dataset would end up being in R - you might run out of memory - but `haven` seems to be the fastest. – Matt Parker May 21 '15 at 02:00

2 Answers2

2

I downloaded the wonderfull dsread http://www.oview.co.uk/dsread/ that converts Sas to csv.

You should use it from the CMD but I choose to integrate this in R. I created a function that migrate Sas to csv:

esp_Sas_Csv<-function(dir,file_input,dir_output,device="c:") {
p1<-paste("cd",dir)
if(tolower(device)!="c:") {p1<-paste(device,"\n",p1)}

#"open" the program

p2<-"SET dsread=C:\\dsread\\dsread.exe"
cmd2<-paste('"%dsread%" /v /t ',file_input,' >  "',dir_output,file_input,'.csv"',sep="")

cmd<-paste(p1,p2,cmd2, sep="\n")
setwd("c:\\temp")
write(cmd,"sas_to_csv.bat")
system("sas_to_csv.bat")

}

This code:

device<-"c:"
dir<-"C:\\temp\\"
file_input<-"my_sas_tab"
dir_output<-"C:\\temp\\"

esp_Sas_Csv(dir,file_input,dir_output,device)

generates and Runs a .bat like

cd C:\temp\
SET dsread=C:\dsread\dsread.exe
"%dsread%" /v /t my_sas_tab >  "C:\temp\my_sas_tab.csv"
dax90
  • 1,088
  • 14
  • 29
2

See my answer here--

I found the haven package to work reasonably quickly in reading sas7bdat files.

Anyway, at 60GB, your best bet is probably to convert it to .csv or something fread can handle, then data.table::fread it into R.

Community
  • 1
  • 1
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198