0

I have a somewhat general question about how read.csv(...) works.

When I read csv datasets (created and exported from Excel), some columns are read into R as numeric (correctly), while others end up as either character (stringsAsFactor=FALSE) or factors (stringsAsFactor=TRUE). How does R determine if its string or numeric in the import process? There is no discernible difference in the columns - e.g., two columns, both t scores, yet one was read in as character, the other as numeric. Can someone explain this to me? (Does my question even make sense?)

Thanks Andrea

duckmayr
  • 16,303
  • 3
  • 35
  • 53
user1638567
  • 69
  • 2
  • 5
  • 1
    Can you show a couple of rows of actual data from your CSV file? – Tim Biegeleisen Aug 11 '16 at 02:14
  • 1
    One of the columns almost certainly contains text somewhere. "Missing" or "#!" or something – thelatemail Aug 11 '16 at 02:16
  • There are a million possible reasons. It's silly to guess. Provide a small [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of a file you are having trouble with and we can tell you for sure. – MrFlick Aug 11 '16 at 02:37
  • 1
    The function that makes the decision is `type.convert()` which is just a helper function for `read.table()` which does most of the `read.csv()` work. Effectively it "attempts to convert it to logical, integer, numeric or complex, and failing that converts it to factor unless `as.is = TRUE`" Look at the documentation or source code for the specifics. – vincentmajor Aug 11 '16 at 03:21

0 Answers0