Try this, where my vec
is your $Height
column:
U <- gsub("[0-9._]", "", vec)
head(U)
# [1] "" "cm" "" "cm" "m" "cm"
U[!nzchar(U)] <- "ft"
U
# [1] "ft" "cm" "ft" "cm" "m" "cm" "m" "cm" "ft" "ft" "ft" "cm" "ft" "cm" "ft" "cm" "ft" "ft" "cm" "ft" "ft" "ft" "ft"
# [24] "ft" "ft" "cm" "ft" "ft" "cm" "ft" "cm" "cm" "ft" "cm" "ft" "cm" "cm" "ft" "ft" "ft" "cm" "ft" "ft" "ft" "ft" "ft"
# [47] "cm" "ft" "cm" "ft" "cm" "cm" "cm" "ft" "cm" "cm" "ft" "ft" "ft" "ft" "ft" "m" "ft" "ft" "ft" "ft" "ft" "ft" "ft"
# [70] "ft" "ft" "ft" "ft" "ft" "ft" "ft" "ft" "ft" "ft" "cm" "ft" NA "cm" "m" "ft" "ft" "cm" "ft" "cm" "cm" "ft" "cm"
# [93] "ft" "ft" "ft" "ft" "ft" "ft" "cm" "ft" "cm" "ft" "ft" "cm" "ft" "ft" NA NA "cm" "cm" "ft" "cm" "cm" "ft" "ft"
# [116] "ft" "cm" "ft" "ft" "m" "ft" "m" "ft" "cm" "ft" "ft" "ft" "cm" "cm" "ft" "cm" "ft" "ft" "cm"
You could also convert the NA
values to "ft"
if you wanted with U[is.na(U)] <- "ft"
, but I think that's unnecessary: it's NA
because there is no number associated with those positions, so setting the units for a missing number seems pointless.
The conversion of the numbers and U
nits now can be done with switch
:
unname(as.numeric(gsub("[^0-9.]", "", vec)) *
sapply(U, switch, m = 1, cm = 1/100, 0.3048))
# [1] 1.737 1.570 1.558 1.670 1.650 1.870 1.710 1.880 1.585 1.676 1.737 1.550 1.646 1.630 1.951 1.700 1.737 1.768 1.860
# [20] 1.554 1.615 1.615 1.737 1.768 1.890 1.750 1.707 1.737 1.800 1.707 1.600 1.630 1.707 1.630 1.737 1.750 1.650 1.737
# [39] 1.707 1.558 1.880 1.707 1.615 1.676 1.646 1.707 1.800 1.798 1.650 1.707 1.800 1.650 1.750 1.646 1.670 1.750 1.737
# [58] 1.558 1.558 1.676 1.859 1.680 1.646 1.737 1.615 1.676 1.798 1.798 1.646 1.707 1.768 1.676 1.798 1.920 1.859 1.768
# [77] 1.585 1.585 1.829 1.660 1.615 NA 1.660 1.880 1.707 1.554 1.710 1.554 1.700 1.780 1.585 1.850 1.558 1.798 1.558
# [96] 1.737 1.829 1.859 1.760 1.737 1.890 1.615 1.737 1.640 1.707 1.768 NA NA 1.750 1.570 1.554 1.720 1.700 1.737
# [115] 1.768 1.707 1.690 1.890 1.951 1.710 1.554 1.670 1.585 1.600 1.768 1.890 1.676 1.800 1.750 1.524 1.950 1.676 1.829
# [134] 1.750
Walk-through:
as.numeric(gsub("[^0-9.]", "", vec))
extracts just the number components
U
is the units extracted from each, where empty strings ""
means there was no unit applied.
switch(U[1], m = 1, cm = 1/100, 1)
would check the first U
unit and return a conversion into meters; the trailing unnamed 1
is the default assigned if U[1]
is not one of the known strings "cm"
and "m"
, which we'll use as 1
(feet).
- because
switch
is not vectorized, I use sapply(U, switch, ...)
to vectorize its effect, and it returns a vector of multipliers to apply to the numbers extracted with as.numeric(.)
Data
vec <- c("5.7", "157_cm", "5.11", "167_cm", "1.65_m", "187_cm", "1.71_m", "188_cm", "5.2", "5.5", "5.7", "155_cm", "5.4", "163_cm", "6.4", "170_cm", "5.7", "5.8", "186_cm", "5.1", "5.3", "5.3", "5.7", "5.8", "6.2", "175_cm", "5.6", "5.7", "180_cm", "5.6", "160_cm", "163_cm", "5.6", "163_cm", "5.7", "175_cm", "165_cm", "5.7", "5.6", "5.11", "188_cm", "5.6", "5.3", "5.5", "5.4", "5.6", "180_cm", "5.9", "165_cm", "5.6", "180_cm", "165_cm", "175_cm", "5.4", "167_cm", "175_cm", "5.7", "5.11", "5.11", "5.5", "6.1", "1.68_m", "5.4", "5.7", "5.3", "5.5", "5.9", "5.9", "5.4", "5.6", "5.8", "5.5", "5.9", "6.3", "6.1", "5.8", "5.2", "5.2", "6.0", "166_cm", "5.3", NA, "166_cm", "1.88_m", "5.6", "5.10", "171_cm", "5.1", "170_cm", "178_cm", "5.2", "185_cm", "5.11", "5.9", "5.11", "5.7", "6.0", "6.1", "176_cm", "5.7", "189_cm", "5.3", "5.7", "164_cm", "5.6", "5.8", NA, NA, "175_cm", "157_cm", "5.10", "172_cm", "170_cm", "5.7", "5.8", "5.6", "169_cm", "6.2", "6.4", "1.71_m", "5.10", "1.67_m", "5.2", "160_cm", "5.8", "6.2", "5.5", "180_cm", "175_cm", "5.0", "195_cm", "5.5", "6.0", "175_cm")