1

I am importing csv data to R using

data <- read.csv(file="file_name.csv")

This data has 9 columns and 5000 rows and data values are real number. Now I want to use this data as a data frame. But the first columns occurs with some levels. I don't want this levels.

Here is a sample data in .csv format

enter image description here

Could any one please help me to remove the levels from the first column after it is imported to R.

Here is my attempt:

data$col_1 = as.numeric(as.character(data$col_1))

But it showing warning:

Warning message:
NAs introduced by coercion 
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Janak
  • 653
  • 7
  • 25
  • It is same warning as I wrote in my attempt. – Janak Dec 03 '14 at 05:51
  • You should include sample data to make this problem [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) otherwise it is very difficult to help you. – MrFlick Dec 03 '14 at 06:13
  • I added sample data. – Janak Dec 03 '14 at 09:53
  • Your screenshot does not paint the full picture. There are probably values *somewhere* which are not unambiguously numbers. Also, your columns in the screenshot are called `Var_1 Var_2 Var_3` but your code acts on `col_1`. – Hugh Dec 15 '14 at 10:50

1 Answers1

3

read.csv is basically a wrapper around read.table, turn off stringsAsFactors will work.

data <- read.csv(file="filename", stringsAsFactors=FALSE)

Then I guess that column will be treated as characters. Then you can do this to convert to numeric.:

data$col <- as.numeric(data$col)

Note: if you have a clean column containing only numbers, read.csv will read in as numeric intelligently, if it read in as factors, it means R detected something that is text or nonnumeric. you might want to pay attention to the warnings see which records got converted to NA due to what reason.

For example, I have a csv file.

enter image description here

When I read in, the id column will be treated as characters simply because there is one row contains ohyeah (if it is empty or NA, R still will treat as column as numeric). I would recommend you to first subset the records that have been contaminated, see if it is a big issue or not.

> subset(data, is.na(as.numeric(id)))
  name     id
4  dan ohyeah
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercio
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178