I was doing some research on which import method is better, read.csv or read_csv. There were several threads comparing the import times etc., and most point to using read_csv for larger files (also fread).
While importing data, I came across an unusual situation.
I used read.csv and read_csv to import the same csv file
CSV1 <- read.csv("C:\\Users\\AH0168850\\Desktop\\Claims.csv")
CSV2 <- read_csv("C:\\Users\\AH0168850\\Desktop\\Claims.csv")
class(CSV1$claim_amount)
class(CSV2$claim_amount)
CSV1$claim_amount <- as.numeric(CSV1$claim_amount)
CSV2$claim_amount <- as.numeric(CSV2$claim_amount)
CSV2$claim_amount <- as.numeric(sub('\\$','',CSV2$claim_amount))
Claim_amount has $ values. When I use read.csv, claim_amount is categorized as factor, which read_csv categories it as character.
On doing an as.numeric to convert the column to numeric, data imported using read.csv goes through without any issue. However, data imported using read_csv converts all values to NA with a warning "NAs introduced by coercion"
To successfully convert the read_csv data I had to use a substitution method before using as.numeric. There are several threads that highlight use of similar functions
e.g.: http://r.789695.n4.nabble.com/Converting-dollar-value-factors-to-numeric-td2130536.html
https://www.rforexcelusers.com/remove-currency-dollar-sign-r/
However, I couldn't find any that give an explanation of why this happens. I did read that read.csv forces a factor for character variables, but I am not sure why that would make a difference in using as.numeric.