0

I am importing a large set of numerical data to R from excel. The data is chemical composition numbers, and I frequently have values which are "< 1", for example. Any time I have a "< 1" in the data frame, R is reading/outputting it as NA.

It seems this is being done as the excel file is read, so my attempts at using gsub to replace the "< " with "" are not working since the number is not being treated as a numeric from the start.

I'd like to pull this data and have anything that is "< 1" simply be "1". But like I said, it is being imported as NA and I don't know what to do.

I'm quite new to R so please bear with my limited ability to clearly define my problem.

Thanks in advance!

SRO
  • 1
  • 4
    `<1 ` is not numerical data, but of type character.. have you tried reading the excel-column as character? – Wimpel Jan 08 '19 at 15:17
  • How many columns does the dataset have, and how many of these should be numeric? One approach would be to read everything in as a string, do the replacement, and the coerce to numeric. Alternately, if the only missingness arises from values that are <1, you could just set all missing values in the numeric columns to 1. I'd be happy to explain either approach. – Joseph Clark McIntyre Jan 08 '19 at 15:18
  • 1
    What commands are you using to import the data? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It's hard to help if we don't even know what code you are running. – MrFlick Jan 08 '19 at 15:18
  • My spreadsheet contains 15 sheets, each of which is comprised of 3-25 columns of data. Each column may or may not contain "< " signs. I need the "< " removed and all #s to be stored as numerics so that the values can be graphed. – SRO Jan 08 '19 at 18:04

1 Answers1

4

read the column as character/text

sample data (./test.xlsx)

enter image description here

code

df <- readxl::read_excel( "./test.xlsx", sheet = 1, col_types = c("text") )
df
# number
#   <chr> 
# 1 1     
# 2 2     
# 3 3     
# 4 <1    
# 5 3 

#post-processing
library( dplyr )
df %>% mutate( number = gsub("<", "", number ) %>% as.numeric )
# # A tibble: 5 x 1
#   number
#    <dbl>
# 1      1
# 2      2
# 3      3
# 4      1
# 5      3
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • This solution gets me 1/2 way there, but I have many columns in my spreadsheet. Is there is broad way of telling R to go into each column and replace the "< ", without having to call the column by name? I have 15 sheets each containing columns with unique names and so I cannot name them in code each time... – SRO Jan 08 '19 at 18:05
  • 1
    @SRO yes, you can. but that s a different quenstion (which has also been answered, so use search) – Wimpel Jan 08 '19 at 18:07