0

I am using a work machine that runs windows 7 and I am using R version 3.5.1 (2018-07-02). This is my first post to stackexchange and I am not an experienced programmer.

I have a .csv file that has many columns, so I am trying to read in only a few specific columns. I run into trouble when I try to read in some of the columns as numeric.

I have a work-around (specify all of the columns as character, and then convert the ones I need to numeric later), but I am very curious why my first way doesn't work.

If I use the code

col_to_read<-rep("NULL",46)
col_to_read[c(11,17,23)]<-"numeric"
col_to_read[2]<-"character"
col_to_read[5]<-"factor"

data<-read.csv("outcome-of-care-measures.csv",colClasses=col_to_read)

I get

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'a real', got '"14.3"'

I have looked for similar questions asked on stackexchange and google, but the proposed solutions didn't work for me. This may be because my error is slightly different that than the others. Usually they report something like

 scan() expected 'a real', got '14.3'

So the number doesn't have the additional set of quotes.

There are many columns in this data set, and the column names are very long so its hard to post what the data looks like in notepad, but the first row goes something like this

"010001","SOUTHEAST ALABAMA MEDICAL CENTER","1108 ROSS CLARK CIRCLE","","","DOTHAN","AL","36301","HOUSTON","3347938701","14.3",

This isn't the full row of data, I stopped at the 14.3 which is the first column I want to specify as numeric.

I have tried a number of read.csv and read.table permutations, one of which includes setting dec="," but I just get the same error. I do not live in a locale where commas are used for decimals. If I do not specify anything for colClasses, the fields I want to be numeric will by default be read as factor.

The output of sessionInfo() is

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] swirl_2.4.3

loaded via a namespace (and not attached):
 [1] httr_1.3.1      compiler_3.5.1  magrittr_1.5    R6_2.2.2        tools_3.5.1     RCurl_1.95-4.11
 [7] yaml_2.2.0      stringi_1.1.7   stringr_1.3.1   digest_0.6.17   testthat_2.0.0  rlang_0.2.2    
[13] bitops_1.0-6   
RamenZzz
  • 85
  • 1
  • 6
  • It's not very easy to help without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) we can copy/paste to try. What does `sessionInfo()` show? Are you in a locale that usually uses a comma rather than a period for decimal places? – MrFlick Oct 09 '18 at 15:50
  • What happens if you don't specify any types? – alistaire Oct 09 '18 at 15:58
  • Sorry, I need some clarification on how to respond to your comments. I would like to paste the output of sessionInfo() but that is too long for a comment, should I just add that info by editing my question? Also, to alistaire's comment, I have an answer (the fields I want to convert to numeric become factors when I do not specify columns) should I add that to my question via edit? – RamenZzz Oct 09 '18 at 16:02
  • This sounds like a type mismatch issue. I'm guessing one of your numeric columns has non numeric characters in it, causing the numeric specification to fail. – Mako212 Oct 09 '18 at 16:27

0 Answers0