1

I have a big time series dataset in which the numeric results are stored in General format in MS-Excel. I tried using gsub(",", "", dummy ), but it did not work. The dataset does not have any , or any other visible special character other than a decimal point, and R picks up the datatype as character. Values are either positive or negative with one NA and all values have different number of decimal places.

How can I convert without having to deal with N/As after converting to numeric. One thing to note though is that when converted to numeric, some of the values are displayed in scientific notation like 12.1 e+03 and other values with four decimal places.

dummy = c("12.1", "42000", "1.2145", "12.25", N/A, "323.369", "-1.235", "335", "0")

# Convert to numeric   
dummy = gsub(",", "", dummy ) 
dummy = as.numeric(dummy )

Error

Warning message:
NAs introduced by coercion "
Ed_Gravy
  • 1,841
  • 2
  • 11
  • 34
  • 2
    This could probably be sorted at an earlier stage, when you are importing your data e.g. `read.table(..., na.strings=c("", "N/A"))`. Then it is likely that this data will be imported as numeric (ps the coercion is a warning and not an error; that is, running `as.numeric(dummy)` gives the correct answer in this case (after changing `N/A` in your example to `"N/A"`). pps your example strings have no commas and so the `gsub` is not required) – user20650 Feb 04 '22 at 23:58

1 Answers1

2

Changing N/A to NA solves this issue:

# N/A to NA
dummy = c("12.1", "42000", "1.2145", "12.25", NA, "323.369", "-1.235", "335")

# Convert to numeric
dummy = gsub(",", "", dummy) 
dummy = as.numeric(dummy)

To do so for your entire dataset, you can use:

# Across columns (for matrices)
data <- apply(data, 2, function(x){
ifelse(x == "N/A", NA, x)
})

# Then convert characters to numeric (for matrices)
data <- apply(data, 2, as.numeric)

# Across columns (for data frames)
data <- lapply(data, function(x){
ifelse(x == "N/A", NA, x)
})

# Then convert characters to numeric (for data frames)
data <- lapply(data, as.numeric)

Update: *apply differences for object types in R -- thanks to user20650 for pointing this out

  • 2
    `apply` is not a great choice for a data.frame, this https://stackoverflow.com/questions/18503177/r-apply-function-on-specific-dataframe-columns gives reasons and alternatives – user20650 Feb 04 '22 at 23:59