0

I have several hundred text files that I'm scanning in as matrices. I have discovered that some of the files have ****** instead of a number in some locations. This causes an error: scan() expected 'a real', got '******'. Is there a way to scan these files in as a matrix and replace the ****** with a number such as 0?

Edit: with the help I received, I was able to solve the issue like this:

dat_test <- read.table("test.txt", na.strings = "******")
dat_trans <- matrix(dat_test[,3], 141, byrow=FALSE)
dat_trans[is.na(dat_trans)] <- -32768

My data is in one really long column so the second line above transposes it in the format I need to analyze it.

Chris
  • 25
  • 1
  • 6
  • 1
    Can you provide an example text file and show it's corresponding output? – Ronak Shah Jun 05 '21 at 03:37
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jun 05 '21 at 03:38
  • Thank you both for the comments. Next time I post I'll be sure to add more detail to my question. – Chris Jun 05 '21 at 14:14

1 Answers1

1

Just use read.table instead:

writeLines(c("1 2 ***** 4 5", "2 3 4 ***** 5"), con = "x.txt")
data <- read.table("x.txt", na.strings = "*****")

If you want to stick with the scan method simply utilize the na.strings argument of the function to coalesce the *****'s into NAs while they are read into memory and you'll probably need to change the what argument to be character to avoid an error since "*****" is not double and transpose the resulting vector back to a table format. Although I would use the more friendly read.table and data.matrix base-R functions to read in tabular data or just simply readLines to read in line by line (if you are avoiding packages like readr and vroom).

If the number of ***'s is not consistent then just use gsub or stringi or regular expressions to parse all *'s out.

As an example:

dat <- scan(text = "1 2 ***** 4 5\n2 3 4 ***** 5", what = "character")
dat2 <- trimws(gsub("*", "", dat, fixed = TRUE))
# if you want zeroes
hold <- as.numeric(dat2)
dat3 <- ifelse(is.na(hold), 0, hold)
jimbrig
  • 117
  • 6
  • 1
    Thank you! I edited my question and added what I ended up doing. The read.table method worked great. I ended up using a different method to replace the NAs with -32768 as I didn't quite understand the syntax of your suggestion. I know I asked to replace NAs with 0 in my question, which I did to keep it simple, so that may have been part of the confusion on my part. – Chris Jun 05 '21 at 14:13
  • You're welcome. It is a confusing response I know but it was all could muster up with the information I was given. Glad you ended up figuring it out and glad to help! – jimbrig Jun 08 '21 at 00:46
  • 1
    Your reply was great! Thanks again for all of the help! – Chris Jun 09 '21 at 17:55