I have inherited a spreadsheet, read in as a data frame, with ~ 10 columns and ~ 400 rows.
In the example below, for one of those columns, you can see that it contains a mix of both percentage values and fractions.
Furthermore, such fractions may contain ' * ' and/or ' 0 ', and in the numerator and/or denominator, as shown under the OBSERVED column of the example cases shown below.
I seek your help with R code for homogenizing all entries in such columns to decimal numbers, as shown under the EXPECTED column for the examples below, and then repeat this process over all columns in the data frame.
For my analysis, it is quite OK to consider missing values (*) as zeroes (0).
EXAMPLE CASES:
OBSERVED vs. EXPECTED
"0.0%" 0.0
"9.5%" 0.095
"5 / 10" 0.5
"* / 16" 0.0
"0 / 12" 0.0
NA 0.0
"0 / *" 0.0
"* / *" 0.0
So far what I've tried are as follows (in this same order):
Step 1. Replace * (missing data) with 0 (zero) - works OK
CFP4_REPLACE_Asterisk_w_Zero <- gsub("\\*","0",play.df$CFP4)
Step 2. Convert % to decimals - works OK only on entries with % symbol, but converts fractions to NA
CFP4_ConvPerc2Dcml <- as.numeric(sub("%", "",CFP4_REPLACE_Asterisk_w_Zero,fixed=TRUE))/100
Step 3. Convert fractions into decimal values - syntax shown below, works OK I think, but in this sequential order of steps, the fractions have been already converted to NA, so it is meaningless to execute here...right?
CFP4_ConvFrct2Dcml <- sapply(CFP4_ConvPerc2Dcml, function(x) eval(parse(text=x)))
If I reverse the relative order of steps 2 and 3, that doesn't help either. I've taken a break from R, and would appreciate any (detailed) help. TIA!