0

How would one write an R data frame to the SAS xpt format and specify the length of each column? For example, in a column of text variables the longest string is 157 characters, however I'd like field length attribute to have 200 characters.

The package haven does not seem to have this option and the package SASxport's documentation is less than clear on this issue.

Len Greski
  • 10,505
  • 2
  • 22
  • 33
CT Hall
  • 667
  • 1
  • 6
  • 27

1 Answers1

3

The SASformat() and SASiformat() functions are used to set an attribute on an R object that sets its format when written to a SAS xport file. To set a data frame column to a 200 character format, use the following approach:

SASformat(mydata$var) <- 'CHAR200.'`
SASiformat(mydata$var) <- 'CHAR200.'`

Then use write.xport() to write the data frame to a SAS xport format.

See page 17 of the SASxport package documentation for details.

SASxport is an old package, so you'll need to load an older version of Hmisc to get it to work properly, per another SO question.

However, on reading the file into SAS it uses the length of the longest string in any observation to set the length of the column, regardless of the format and informat attributes. Therefore, one must write at least one observation containing trailing blanks to the desired length in order for SAS to set the length to the desired size. Ironically, this makes the format and informat superfluous.

This can be accomplished with the str_c() function from the stringr package.

Putting it all together...

library("devtools")
install_version("Hmisc", version = "3.17-2")
library(SASxport)
library(Hmisc)
## manually create a data set
data <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'B', NA, '*' ), z=c("this is a test","line 2","another text string",
    "bottom line") )

# workaround - extend the string variable to desired length (30 characters) by 
# adding trailing blanks, using stringr::str_c() function 
library(stringr)
data$z <- sapply(data$z,function(x){str_c(x,str_dup(" ",30-nchar(x)),collapse=TRUE)})
nchar(data$z)

# write to SAS XPORT file
tmp <- tempfile(fileext = ".dat")
write.xport( data, file = tmp )

We'll read the file into SAS and use lengthc() to check the size of the z column.

libname testlib xport '/folders/myfolders/xport.dat';
proc copy in=testlib out=work;
run;

data data;
   set data;
   lenZ = lengthc(z);
   run;

...and the output:

enter image description here

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • That doesn't work. It just sets the format that is attached to the variable, but does not change the length used to define the variable. I tried your code and it generated Z as having length set to only 19 bytes but with an attached format with a width of 30 bytes. – Tom Jan 22 '18 at 23:49
  • @Tom - thanks for the feedback. I had to reinstall SAS on my laptop to debug this. It turns out that if the data in R does not contain trailing blanks, SAS reads the length as the largest character string read in `PROC COPY`. I've adjusted my answer to account for this, and included the relevant SAS code & output. – Len Greski Jan 23 '18 at 12:27
  • The update doesn't really address the underlying issue either. The original problem is that `write.xport()` is defining the length of the SAS character variables to match the maximum length of any value. The OP wanted to force the variables to be defined with a specific length without having to modify the actual values. – Tom Jan 23 '18 at 15:07
  • A work around that might make sense would be if you showed R code that could be used to pad one or more values for a specify variable in the data frame to the desired length and then xport the modified data frame. – Tom Jan 23 '18 at 16:51
  • @Tom - Yes, I know that my response highlights the problem, will add code to pad the strings later this evening. Didn’t have time to do that early this morning. – Len Greski Jan 23 '18 at 17:01