1

While I checked this link, I am still struggling in getting a formatted sas file into R as I have a formatted .sas7bdat file (attached here) but when I tried to import it into R I noticed that all formats were lost. I used 2 different codes:

## Code 1:
##========
library(haven)
data <- read_sas("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", NULL)

## Code 2:
##========
library(sas7bdat)
data("sas7bdat.sources")

data<-read.sas7bdat("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", debug= F)

table(data$SEX) # gives me 1 and 2 instead of males and females
#     1      2 
#880916 799960 

# Then I tried this code (as I have sas catalog folder named format so I added that to my prior code; formats.sas7bcat) as follows
#===============================================================================
data<- read_sas("C:/Users/mmr2011/OneDrive/OneDrive/R codes/df_nsclc1.sas7bdat", catalog_file = "C:/Users/mmr2011/OneDrive/OneDrive/R codes/formats.sas7bcat") 

# table(data$SEX)
#   1     2 
#50190 66064 

heaven imported data non heaven imported data

enter image description here

While I need them to be as they are in sas as follow enter image description here

I am using SAS catalog folder in Windows that is shown below (screenshot No. 5). Also, It is available here enter image description here

Any advice will be greatly appreciated

Mohamed Rahouma
  • 1,084
  • 9
  • 20
  • The answer to the link you posted shows how to tell `read_sas()` the name of the format catalog, but in your posted code it does not appear you have made any attempt to do that. – Tom Dec 26 '20 at 15:37
  • @Tom I will amend that as I tried but it is not solved yet. – Mohamed Rahouma Dec 26 '20 at 15:44
  • Are you sure the formats referenced in the dataset you are reading are actually defined in the format catalog you provided to `read_sas()`? – Tom Dec 26 '20 at 16:20
  • My sas catalog name is formats and it is located in the same folder "C:/Users/mmr2011/OneDrive/R codes". When I tried `fmt <- readLines('C:/Users/mmr2011/OneDrive - med.cornell.edu/OneDrive/R codes including sts/formats.sas')` it gives me error `No such file or directory`. Am I missing something? – Mohamed Rahouma Dec 26 '20 at 17:00
  • 1
    A format catalog has an extension of `sas7bcat`. If the file you have has an extension of `sas` then it is a program file and not a format catalog. If so you might be able to read the text of the code and convert it to R syntax for defining value labels in R, but that would depend on you being able to understand the style the programmer used when writing the SAS code. If you have trouble post examples of the content of the file as a new quesiton. – Tom Dec 26 '20 at 17:38
  • 1
    When you provide new information, it's best to edit your question rather than just leave it in the comments. Do you have access to SAS to convert your .sas file to a format catalog? – Reeza Dec 26 '20 at 18:27
  • @Reeza Thanks for your efforts you and @Tom. I have a catalog folder named formats and I added catalog_file = to my `read_sas()` code as you can see in my edited code above but it doesn't solve the issue. Appreciate your help. upvoted. – Mohamed Rahouma Dec 26 '20 at 19:14
  • A .sas file is not a SAS format catalog, it is SAS code to generate the catalog. Catalogs are system dependent (ie different on Windows/Unix) so this is common when transferring catalogs. The `read_sas()` package requires the catalog so your current approach CANNOT work. You need to convert the .sas file to a catalog using SAS or manually recode your data. The .sas file is just text, you can open it in any editor. There are free versions of SAS available, especially for academics but they're usually Unix so not sure that will work for you. – Reeza Dec 29 '20 at 01:51
  • @Reeza Thx for your input. I edited my question. I am using SAS catalog folder in Windows that is shown in (screenshot No. 5) – Mohamed Rahouma Dec 29 '20 at 19:44
  • 1
    Your last code should have worked from what I understand. Can you share the catalog file as well for someone else to reproduce the issue? Have you ensured you have the latest version of haven? – Reeza Dec 29 '20 at 23:52
  • @Reeza Thx for your efforts. Upvoted. I uploaded my sas catalog and added a hyperlink to that. – Mohamed Rahouma Dec 30 '20 at 16:45

1 Answers1

2

I think the issue you have, most likely, is misunderstanding how R labels work.

When I use the following SAS code:

libname temp 'h:\temp\';
proc format lib=temp;
  value sexf
  1='Female'
  2='Male'
  ;
  value racef
  1='Black'
  2='Asian'
  3='White'
  4='Other'
  ;
  value hispf
  1='Of Hispanic Origin'
  2='Not of Hispanic Origin'
  ;
quit;
options fmtsearch=(temp);
data temp.rtest;
  input sex race hisp;
  format sex sexf. race racef. hisp hispf.;
datalines;
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 4 1
2 4 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
1 4 2
2 4 2
;;;;
run;

And then use the following R code:

library(haven)
data <- read_sas("H:/temp/rtest.sas7bdat", catalog_file="H:/temp/formats.sas7bcat")   
print(data)

It works as expected - the console prints the labelled text.

# A tibble: 16 x 3
          sex      race                       hisp
    <dbl+lbl> <dbl+lbl>                  <dbl+lbl>
 1 1 [Female] 1 [Black] 1 [Of Hispanic Origin]    
 2 2 [Male]   1 [Black] 1 [Of Hispanic Origin]    
 3 1 [Female] 2 [Asian] 1 [Of Hispanic Origin]    
 4 2 [Male]   2 [Asian] 1 [Of Hispanic Origin]    
 5 1 [Female] 3 [White] 1 [Of Hispanic Origin]    
 6 2 [Male]   3 [White] 1 [Of Hispanic Origin]    
 7 1 [Female] 4 [Other] 1 [Of Hispanic Origin]    
 8 2 [Male]   4 [Other] 1 [Of Hispanic Origin]    
 9 1 [Female] 1 [Black] 2 [Not of Hispanic Origin]
10 2 [Male]   1 [Black] 2 [Not of Hispanic Origin]
11 1 [Female] 2 [Asian] 2 [Not of Hispanic Origin]
12 2 [Male]   2 [Asian] 2 [Not of Hispanic Origin]
13 1 [Female] 3 [White] 2 [Not of Hispanic Origin]
14 2 [Male]   3 [White] 2 [Not of Hispanic Origin]
15 1 [Female] 4 [Other] 2 [Not of Hispanic Origin]
16 2 [Male]   4 [Other] 2 [Not of Hispanic Origin]

However, if I view it in RStudio's viewer by double-clicking on the dataset in the Data pane, it doesn't, and that is what you pasted into the question (a picture of that). I don't believe that's supported (variable labels are, meaning column header labels, but not value labels); if you want to verify that you may want to ask a new question specifically mentioning that, with the code here cleaned up (you're welcome to use my example code).

What you will probably want to do is convert the value labels to factors. This can be done a few ways; there is some discussion of why in the labelled package documentation, which is one thing you could use for this, but there are several approaches. Again, this would be a good separate question if you can't figure it out on your own. Factors are how R would typically manage this sort of thing (i.e., categorical variables).

Joe
  • 62,789
  • 6
  • 49
  • 67