-1

I have a project where I read a SAS dataset into R and then after some manipulation in R I use the write.table function to create the dataset as a .txt file. My issue is, when I do this new lines are created due to hidden new line characters? in the SAS file.

How the data appears in SAS and in R

THIS IS FAKE DATA

How the data appears in the text file I create when I open it in Notepad++

THIS IS
FAKE DATA

How do I go about preventing this. I suppose I would accept an answer that removes them from the SAS dataset or one that removes it from R or even if there was a way to automatically correct the issue in the notepad++ file.

  library(haven)

    data<-as.data.frame(read_sas("data.sas7bdat")

    write.table(data,"data.txt",header=FALSE,sep="\t", quote=FALSE,row.names=FALSE,col.names=FALSE)
astel
  • 192
  • 7
  • If you write to a text file a line that has the end of line character(s) in the middle of the line then of course the text file will show it as two lines instead of one line. You will have to replace those characters with something else (or just remove them) before writing to the text file. – Tom May 11 '23 at 18:31
  • The problem is that those characters aren't shown in the SAS file they are hidden. The data in SAS looks exactly like my example – astel May 11 '23 at 18:45
  • It's not clear to me what you mean by "appears in SAS and in R", so you mean after you read it in to a frame or character vector? It would be helpful to have "real" raw R objects, can you edit your question to add the output from `dput` on a sample of your real data? – r2evans May 11 '23 at 18:55
  • If you want to see the characters in SAS print the value use the $HEX format. A regular space is '20'x. What you have is some combination of linefeed, '0A'x, and carriage return, '0D'x. – Tom May 11 '23 at 19:06
  • When I say appears in SAS I mean when I open the SAS dataset in SAS the line appears as I wrote. When I read that SAS dataset into R using the haven packager and create a dataframe, the line still appears normally. It isn't until I use write.table in R to create a .txt file from that R data frame that the line now appears on two separate lines. I can't output the actual data as it is sensitive. – astel May 11 '23 at 19:56
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What is the code you are actually running? – MrFlick May 11 '23 at 20:39
  • I included my code, I don't know how it helps you though – astel May 11 '23 at 23:58

1 Answers1

1

Control characters within a string are called non-printables.

A string with embedded control characters such as carriage return (\r) will appear as a split line in a viewer such as Excel, but in other viewers, such as SAS the \r does not show the splitting.

You will need to replace or remove those characters before writing out the string to a disk file with write.table.

In SAS you can use functions such as COMPRESS, TRANSLATE, PRXCHANGE, or TRANSTRN to remove or replace parts of a string.

In R you can use a function such as gsub.

Richard
  • 25,390
  • 3
  • 25
  • 38
  • Is there a list of characters I have to remove? You say such as \r but this is difficult since I can't see the characters (they dont show up when viewing the data in SAS or R the data just looks normal) I don't know what to remove. – astel May 12 '23 at 19:19
  • In SAS you can use the `COLLATE()` to construct a string containing each character in a range of ascii codes. Can use as follows: `cleanString=TRANSLATE(dirtyString,' ', COLLATE(0,31));`. Every character in the string with code 0..31 will be translated to a space. – Richard May 13 '23 at 03:02