I am working with a csv file which was originally formatted in excel. I want to convert the rate column into numeric and remove the "$" sign.
I read in the file with :
> NImp <- read.csv("National_TV_Spots 6_30_14 to 8_31_14.csv",
sep=",", header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE, na.strings=c("Not Monitored"))
The data frame looks like this:
HH.IMP..000. ISCI Creative Program Rate
1 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
2 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
3 141 IT14429 Rising Costs30 (Opportunity Scholar - No Nursing) BONUS $0.00
4 476 ITES15443H Matthew Traina (B. EECT/A. CEET) :60 (no loc) Law & Order: SVU $0.00
5 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
When I do do the conversion, I get an error message: > NImp$Rate <- as.numeric(gsub("$","", NImp$Rate))
Warning message:
NAs introduced by coercion
and all values get coerced to NAs.
I also tried, NImp$Rate <- as.numeric(sub("\\$","", NImp$Rate))
but again got the same warning message. However not all values became NAs - only specific ones. I opened the csv in excel to check and I realized that excel forces csv column width too narrow resulting in "####" cells. These cells are being coerced to "NA" by r
.
I tried the option of opening the file in notepad and read the notepad file into r
. But I get the same results. The values are correctly displayed in both Notepad and when I read the file into r
. But when I change to numeric, everything that shows as "####" in excel, becomes NA
.
What should I do?
Adding str(NImp)
'data.frame': 9859 obs. of 19 variables:
$ Spot.ID : int 13072903 13072904 13072898 13072793 13072905 13072899 13072397 13072476 13072398 13072681 ...
$ Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Hour : int 0 0 0 0 0 0 1 1 1 2 ...
$ Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Local.Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Broadcast.Week : int 1 1 1 1 1 1 1 1 1 1 ...
$ Local.Hour : int 0 0 0 0 0 0 1 1 1 2 ...
$ Local.Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Market : chr "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" ...
$ Vendor : chr "NUVO" "NUVO" "AFAM" "USA" ...
$ Station : chr "NUVO" "NUVO" "AFAM" "USA" ...
$ M18.34.IMP..000.: int NA NA 3 88 NA 3 NA 53 NA 37 ...
$ W18.34.IMP..000.: int NA NA 86 66 NA 86 NA 70 NA 60 ...
$ A18.34.IMP..000.: int NA NA 89 154 NA 89 NA 123 NA 97 ...
$ HH.IMP..000. : int NA NA 141 476 NA 141 NA 461 NA 434 ...
$ ISCI : chr "IT3896" "IT3896" "IT14429" "ITES15443H" ...
$ Creative : chr "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Matthew Traina (B. EECT/A. CEET) :60 (no loc)" ...
$ Program : chr "NUVO CINEMA" "NUVO CINEMA" "BONUS" "Law & Order: SVU" ...
$ Rate : chr "$0.00" "$0.00" "$0.00" "$0.00" ...