0

I am working on an assignment where one of my columns contains measurements in feet, cm, and m, and I am trying to convert them all to metres. So far I have been able to convert individual cells to simple numeric values (e.g. 5_ft_7 to 5.7), but I cannot find a function that will convert them to metres, especially without modifying the values that are already in centimetres/metres.

My question is, is there a way of targeting JUST the cells that contain 'ft' without specifying each of them individually?

Dataset (hopefully this helps):

> Data_original3$Height
  [1] "5.7"    "157_cm" "5.11"   "167_cm" "1.65_m" "187_cm" "1.71_m" "188_cm" "5.2"   
 [10] "5.5"    "5.7"    "155_cm" "5.4"    "163_cm" "6.4"    "170_cm" "5.7"    "5.8"   
 [19] "186_cm" "5.1"    "5.3"    "5.3"    "5.7"    "5.8"    "6.2"    "175_cm" "5.6"   
 [28] "5.7"    "180_cm" "5.6"    "160_cm" "163_cm" "5.6"    "163_cm" "5.7"    "175_cm"
 [37] "165_cm" "5.7"    "5.6"    "5.11"   "188_cm" "5.6"    "5.3"    "5.5"    "5.4"   
 [46] "5.6"    "180_cm" "5.9"    "165_cm" "5.6"    "180_cm" "165_cm" "175_cm" "5.4"   
 [55] "167_cm" "175_cm" "5.7"    "5.11"   "5.11"   "5.5"    "6.1"    "1.68_m" "5.4"   
 [64] "5.7"    "5.3"    "5.5"    "5.9"    "5.9"    "5.4"    "5.6"    "5.8"    "5.5"   
 [73] "5.9"    "6.3"    "6.1"    "5.8"    "5.2"    "5.2"    "6.0"    "166_cm" "5.3"   
 [82] NA       "166_cm" "1.88_m" "5.6"    "5.10"   "171_cm" "5.1"    "170_cm" "178_cm"
 [91] "5.2"    "185_cm" "5.11"   "5.9"    "5.11"   "5.7"    "6.0"    "6.1"    "176_cm"
[100] "5.7"    "189_cm" "5.3"    "5.7"    "164_cm" "5.6"    "5.8"    NA       NA      
[109] "175_cm" "157_cm" "5.10"   "172_cm" "170_cm" "5.7"    "5.8"    "5.6"    "169_cm"
[118] "6.2"    "6.4"    "1.71_m" "5.10"   "1.67_m" "5.2"    "160_cm" "5.8"    "6.2"   
[127] "5.5"    "180_cm" "175_cm" "5.0"    "195_cm" "5.5"    "6.0"    "175_cm"

Thank you

Zoe
  • 27,060
  • 21
  • 118
  • 148
  • 1
    Probably something like `grepl("ft", vals)` - this returns a logical vector - but a slightly more complete description would be useful. – Ben Bolker Oct 13 '21 at 21:27
  • It's easier to help you if you include a simple reproducible example: with sample input and desired output that can be used to test and verify possible solutions. – TarJae Oct 13 '21 at 21:30

3 Answers3

0

Try this, where my vec is your $Height column:

U <- gsub("[0-9._]", "", vec)
head(U)
# [1] ""   "cm" ""   "cm" "m"  "cm"
U[!nzchar(U)] <- "ft"
U
#   [1] "ft" "cm" "ft" "cm" "m"  "cm" "m"  "cm" "ft" "ft" "ft" "cm" "ft" "cm" "ft" "cm" "ft" "ft" "cm" "ft" "ft" "ft" "ft"
#  [24] "ft" "ft" "cm" "ft" "ft" "cm" "ft" "cm" "cm" "ft" "cm" "ft" "cm" "cm" "ft" "ft" "ft" "cm" "ft" "ft" "ft" "ft" "ft"
#  [47] "cm" "ft" "cm" "ft" "cm" "cm" "cm" "ft" "cm" "cm" "ft" "ft" "ft" "ft" "ft" "m"  "ft" "ft" "ft" "ft" "ft" "ft" "ft"
#  [70] "ft" "ft" "ft" "ft" "ft" "ft" "ft" "ft" "ft" "ft" "cm" "ft" NA   "cm" "m"  "ft" "ft" "cm" "ft" "cm" "cm" "ft" "cm"
#  [93] "ft" "ft" "ft" "ft" "ft" "ft" "cm" "ft" "cm" "ft" "ft" "cm" "ft" "ft" NA   NA   "cm" "cm" "ft" "cm" "cm" "ft" "ft"
# [116] "ft" "cm" "ft" "ft" "m"  "ft" "m"  "ft" "cm" "ft" "ft" "ft" "cm" "cm" "ft" "cm" "ft" "ft" "cm"

You could also convert the NA values to "ft" if you wanted with U[is.na(U)] <- "ft", but I think that's unnecessary: it's NA because there is no number associated with those positions, so setting the units for a missing number seems pointless.

The conversion of the numbers and Units now can be done with switch:

unname(as.numeric(gsub("[^0-9.]", "", vec)) *
  sapply(U, switch, m = 1, cm = 1/100, 0.3048))
#   [1] 1.737 1.570 1.558 1.670 1.650 1.870 1.710 1.880 1.585 1.676 1.737 1.550 1.646 1.630 1.951 1.700 1.737 1.768 1.860
#  [20] 1.554 1.615 1.615 1.737 1.768 1.890 1.750 1.707 1.737 1.800 1.707 1.600 1.630 1.707 1.630 1.737 1.750 1.650 1.737
#  [39] 1.707 1.558 1.880 1.707 1.615 1.676 1.646 1.707 1.800 1.798 1.650 1.707 1.800 1.650 1.750 1.646 1.670 1.750 1.737
#  [58] 1.558 1.558 1.676 1.859 1.680 1.646 1.737 1.615 1.676 1.798 1.798 1.646 1.707 1.768 1.676 1.798 1.920 1.859 1.768
#  [77] 1.585 1.585 1.829 1.660 1.615    NA 1.660 1.880 1.707 1.554 1.710 1.554 1.700 1.780 1.585 1.850 1.558 1.798 1.558
#  [96] 1.737 1.829 1.859 1.760 1.737 1.890 1.615 1.737 1.640 1.707 1.768    NA    NA 1.750 1.570 1.554 1.720 1.700 1.737
# [115] 1.768 1.707 1.690 1.890 1.951 1.710 1.554 1.670 1.585 1.600 1.768 1.890 1.676 1.800 1.750 1.524 1.950 1.676 1.829
# [134] 1.750

Walk-through:

  • as.numeric(gsub("[^0-9.]", "", vec)) extracts just the number components
  • U is the units extracted from each, where empty strings "" means there was no unit applied.
  • switch(U[1], m = 1, cm = 1/100, 1) would check the first U unit and return a conversion into meters; the trailing unnamed 1 is the default assigned if U[1] is not one of the known strings "cm" and "m", which we'll use as 1 (feet).
  • because switch is not vectorized, I use sapply(U, switch, ...) to vectorize its effect, and it returns a vector of multipliers to apply to the numbers extracted with as.numeric(.)

Data

vec <- c("5.7", "157_cm", "5.11", "167_cm", "1.65_m", "187_cm", "1.71_m", "188_cm", "5.2", "5.5", "5.7", "155_cm", "5.4", "163_cm", "6.4", "170_cm", "5.7", "5.8", "186_cm", "5.1", "5.3", "5.3", "5.7", "5.8", "6.2", "175_cm", "5.6", "5.7", "180_cm", "5.6", "160_cm", "163_cm", "5.6", "163_cm", "5.7", "175_cm", "165_cm", "5.7", "5.6", "5.11", "188_cm", "5.6", "5.3", "5.5", "5.4", "5.6", "180_cm", "5.9", "165_cm", "5.6", "180_cm", "165_cm", "175_cm", "5.4", "167_cm", "175_cm", "5.7", "5.11", "5.11", "5.5", "6.1", "1.68_m", "5.4", "5.7", "5.3", "5.5", "5.9", "5.9", "5.4", "5.6", "5.8", "5.5", "5.9", "6.3", "6.1", "5.8", "5.2", "5.2", "6.0", "166_cm", "5.3", NA, "166_cm", "1.88_m", "5.6", "5.10", "171_cm", "5.1", "170_cm", "178_cm", "5.2", "185_cm", "5.11", "5.9", "5.11", "5.7", "6.0", "6.1", "176_cm", "5.7", "189_cm", "5.3", "5.7", "164_cm", "5.6", "5.8", NA, NA, "175_cm", "157_cm", "5.10", "172_cm", "170_cm", "5.7", "5.8", "5.6", "169_cm", "6.2", "6.4", "1.71_m", "5.10", "1.67_m", "5.2", "160_cm", "5.8", "6.2", "5.5", "180_cm", "175_cm", "5.0", "195_cm", "5.5", "6.0", "175_cm")
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • I fixed an omission, the calcs should be right. Namely, the first is `"5.7"`, which is 5.7 feet, which is around 1.737 meters. I'm confused, though: you said you want to convert the numbers to meters, but your output here in the comment is merely *units*. I've added an augmented `U`nits variable, but it doesn't change the process. – r2evans Oct 14 '21 at 12:25
0

You can just select the rows that contain ft or in the Data_Original$Height data you showed, select the rows where there is no _ and change only these rows.

Data_Original$Height[grepl(pattern = "_", Data_Original$Height) == F] <- round(as.numeric(Data_Original$Height[grepl(pattern = "_", Data_Original$Height) == F])*0.3048, 2)

And if you want to use the data with the 'ft' still labelled

Data_Original$Height[grepl(pattern = "ft", Data_Original$Height)] <- round(as.numeric(gsub("[^0-9.]", "", Data_Original$Height[grepl(pattern = "ft", Data_Original$Height)]))*0.3048, 2)
Tjn25
  • 685
  • 5
  • 18
0

Here is the tidyverse way:

Data_original3 %>%
  separate(Height, c('feet', 'inches'), "_ft_", convert = TRUE, remove = FALSE) %>%
  mutate(Height = if_else(grepl("ft", Height), paste0(round((12*as.numeric(feet) + inches)*2.54/100, digits = 2), "_m"), paste0(Height))) %>%
  select(Height)

Using separate, I split feet and inches into two columns, then perform the conversion calculation back into the original "Height" column.

Note: This only works if your still splitting by "_ft_" instead of "."

k3b
  • 344
  • 3
  • 15