0

I looked up the answer on these threads but none are working in my case:

R change all columns of type factor to numeric,

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f,

How to convert a data frame column to numeric type?

I am working with a data frame (8600 x 168) which I imported:

originaldf2<-read.csv("Occupanyrate_Train"). Apart from the first three columns, all are numeric values. Many of the columns are of class factor after importing. I need all columns from 3 to 168 in the numeric class for analysis. There were a number of empty values and "-" in these columns which I converted to NAs by doing this:

originaldf2[originaldf2=="-"]=NA originaldf2[originaldf2==""]=NA. The columns contain nothing but decimal numbers, Integers and NAs. I tried using the following command to convert all variables to numeric class:

originaldf2<-as.numeric(as.character(originaldf2[ , 4:168])) and I get the error: Warning message: NAs introduced by coercion and my dataframe itself becomes strange:

str(originaldf2) num [1:165] NA NA NA NA NA NA NA NA NA NA ...

I also tried: as.numeric(levels(originaldf2))[as.integer(originaldf2)]

to try and coerce the whole dataframe but I got the error Error: (list) object cannot be coerced to type 'integer'

Then I noticed that there are unused levels which might be the reason, so I dropped the unused levels: originaldf2<-str(drop.levels(originaldf2)) and tried to again coerce but still not happening! Here's a subset of the df (10 x 12):

Property_ID Month Zipcode Occupancy_Rate.Response.Variable. VAR_1 VAR_2 VAR_3 1 A3FF8CD6 13-Jan 30064 0.93 468 10 0.7142857 2 A3FF8CD6 13-Feb 30064 0.93 468 10 0.7142857 3 A3FF8CD6 13-Mar 30064 0.94 468 10 0.7142857 4 A3FF8CD6 13-Apr 30064 0.96 468 10 0.7142857 5 A3FF8CD6 13-May 30064 0.953 468 10 0.7142857 6 A3FF8CD6 13-Jun 30064 0.93 468 10 0.7142857 7 A3FF8CD6 13-Jul 30064 0.925 468 10 0.7142857 8 A3FF8CD6 13-Aug 30064 0.925 468 10 0.7142857 9 A3FF8CD6 13-Sep 30064 0.95 468 10 0.7142857 10 A3FF8CD6 13-Oct 30064 0.945 468 10 0.7142857 11 A3FF8CD6 13-Nov 30064 0.9 NA <NA> NA 12 A3FF8CD6 13-Dec 30064 0.945 NA <NA> NA VAR_4 VAR_5 VAR_6 1 0.5714286 0.8 0.75 2 0.5714286 0.8 0.75 3 0.5714286 0.8 0.75 4 0.5714286 0.8 0.75 5 0.5714286 0.8 0.75 6 0.5714286 0.8 0.75 7 0.5714286 0.8 0.75 8 0.5714286 0.8 0.75 9 0.5714286 0.8 0.75 10 0.5714286 0.8 0.75 11 NA NA NA 12 NA NA NA

Community
  • 1
  • 1
vagabond
  • 3,526
  • 5
  • 43
  • 76
  • 1
    use `originaldf2 <- read.csv("Occupanyrate_Train.csv", stringsAsFactors = FALSE)` and try again – rawr Mar 24 '14 at 00:25
  • `stringAsFactors` is not as argument in `read.csv` i think . . . i'm getting the error: `Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (stringAsFactors = FALSE)` – vagabond Mar 24 '14 at 00:31
  • `stringsAsFactors` is not an argument to `read.csv` directly, but it is to `read.table`, which is subsequently called. – thelatemail Mar 24 '14 at 00:34
  • I didn't know read.csv would work without a .csv extension – rawr Mar 24 '14 at 00:36
  • so I added the `stringsAsFactors` argument and tried: `originaldf<-as.numeric(as.character(originaldf[ ,c(sprintf("VAR_%.i", 1:164))]))` since I want all variables columns from 1 to 164 changed to numeric but its not working ! same error - `Warning message: NAs introduced by coercion` – vagabond Mar 24 '14 at 00:47

3 Answers3

4

The advice to use stringsAsFactors will only get you so far. It appears that you probably want to use colClasses as well. It will both coerce the desired columns to numeric and create NA's that are appropriate.

originaldf <- read.csv( file_name, 
                        colClasses=c(rep( "character",3), rep("numeric", 6) ) )

This also makes input happen (much, much) faster for large dataframes since the logic that is used to guess at the classes is bypassed.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Use the na.strings argument to convert - to NA while reading:

x <- read.csv(na.strings=c('-'),
text="a,b,c
0,,
-,1,2")

 x
   a  b  c
1  0 NA NA
2 NA  1  2

Blank values are converted to NA automatically in numeric columns. It is the - values that are forcing the column to be interpreted as factor.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • oh man! what a mountain molehill story. thanks. Out of interest, if I don't use the `na.strings` to change "-" to NA while reading the file, but replace the "-" after importing with `NA` and then drop unused levels, is there no way of changing class from factor to numeric? – vagabond Mar 24 '14 at 00:57
  • @vagabond As far as I'm aware (and I'm sure that someone will correct me if I'm wrong), you would have to do it column-by-column, for example with `lapply`. – Matthew Lundberg Mar 24 '14 at 01:00
0

Definitely use stringsAsFactors = FALSE in the read.csv statement. It'll work.

  • Nope. It's definitely not working this way ! With `stringAsFactors=FALSE`, my columns are read as class: character which is a step closer I am sure. if I state that as TRUE, they stay as class factor. – vagabond Mar 24 '14 at 00:53