0

I have a dataset with nearly 30,000 rows and 1935 variables(columns). Among these many are character variables (around 350). Now I can change data type of an individual column using as.numeric on it, but it is painful to search for columns which are character type and then apply this individually on them. I have tried writing a function using a loop but since the data size is huge, laptop is crashing. Please help.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Sourav Sarkar
  • 15
  • 1
  • 4
  • 1
    a `dput(head(dataset))` would help answering (is there factors or not, etc.) – Tensibai Sep 04 '15 at 12:35
  • 2
    (Well, 1935 columns may be too large, something along `dput(head(dataset[,5:15]))` could be easier to follow, adapt the range to include one or two column you wish to coerce. – Tensibai Sep 04 '15 at 12:50
  • 1
    Tired waiting a feedback so example: `sapply(colnames(iris), function(x) { if(is.character(paste(iris[1,x]))) { as.numeric(iris[,x]) } else { iris[,x] } }) ` will set NA if there's character which are non-numeric and coerce to numeric columns where the first element is a char. Maybe not the better way, but without a clue on the data, I think it is the safest one. – Tensibai Sep 04 '15 at 13:28

2 Answers2

0

Something like

take <- sapply(data, is.numeric)
which(take == FALSE)

identify which variables are numeric, but I don't know how extract automatically, so

apply(data[, c(putcolumnsnumbershere)], 1, as.character))
PereG
  • 1,796
  • 2
  • 22
  • 23
  • you're first line can be a problem if a column is a factor. In the last line, I believe you're looking for `is.character` rather than `as.character` – Cath Sep 04 '15 at 13:19
0

use

sapply(your.data, typeof)

to create a vector of variable types, then use this vector to identify the character vector columns to be converted.

Wyldsoul
  • 1,468
  • 10
  • 16