-2

I am a beginner and I have this data frame with height (x ft' y inches") I need to convert this value to a single number for height in inches

height_w_shoes height_wo_shoes
5'11" 5'10"
6'1" 6'0.25
6.5.25" 6'4"

I need to fix the typo in the last row of column "height_w_shoes" (or maybe not, depending on the solution, currently a "." when it should be a "'") and then convert these measurements into inches as such:

height_w_shoes height_wo_shoes
71 70
73 72.25
77.25 76

I am super stuck as I am having a hard time converting these string variables into numeric values. Please help, Thank you

Mark Mamon
  • 45
  • 4

2 Answers2

0

Here's a dplyr and purrr solution:

Test data UPDATED:

df <- data.frame(
  h1 = c("6.5.25", "5'11\"", "6'11\"", "6'0.25"),
  h2 = c("66.4.2", "7'10\"", "16'11\"", "7'2.50"),
  h3 = c("4'4.2", "7'10\"", "16'11\"", "7.7.77")
)

Solution UPDATED:

library(dplyr)
library(purrr)
 df %>%
   # Step 1: correct typo:
   mutate(across(c(everything()), 
                ~ sub("(?<=^\\d{1}|^\\d{2})\\.", "'", ., perl = T))) %>%
   # Step 2: remove trailing `"`:
   mutate(across(c(everything()), 
                ~ gsub('"$', "", .))) %>%
   # Step 3: split strings on `'`:
   mutate(across(c(everything()), 
                ~ strsplit(.,"'"))) %>%
   # Step 4: convert to numeric and perform calculation:
   mutate(across(everything(), 
                 ~ map_dbl(., function(x) as.numeric(x)[1] * 12 + as.numeric(x)[2])))
     h1    h2     h3
1 77.25 796.2  52.20
2 71.00  94.0  94.00
3 83.00 203.0 203.00
4 72.25  86.5  91.77
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • I think that there is an error in the last row mutate row, everything runs except for the last row " mutate(across(everything(), ~ map_dbl(., function(x) as.numeric(x)[1] * 12 + as.numeric(x)[2])))" --- I receive an Error : Problem with `mutate()` input `..1`. `..1 = across(...)`. iNAs introduced by coercion – Mark Mamon Aug 06 '21 at 14:37
  • The only value this seemed to work on was in column 2, row 3. I think because that value does not have a trailing `"` after the value like the other values in the data frame. I need to remove that trailing `"` from each value where it is present within the dataframe and then I think it will work. Can you show me how to do that? – Mark Mamon Aug 06 '21 at 15:59
  • Just to be sure: did the code work on the test data I used? – Chris Ruehlemann Aug 06 '21 at 16:35
  • See updated answer. Does this work now? I've realised you may have not `' '` (i.e., 2 single quote marks) but instead `"` (1 double quote mark). If there are still problems then why not post your data in reproducible format?? – Chris Ruehlemann Aug 06 '21 at 16:40
0

Some data insertion errors make this a lot harder then it actually is

library(stringr)
library(dplyr)
df <- data.frame(x =c("5\'11\"","6'1\"","6.5.25\""),y = c("5\'10\"","6\'0.25","6\'4\"") )

correction <- function(str){
  output <- str_replace(str,"6\\.5\\.25\"","6\'5\\.25")
  # Corrects first typo
  output <- ifelse(str_detect(output,"\"$") == FALSE,str_replace(output,"$","\""),output)
  # Corrects second typo
  output <- 
    as.numeric(str_extract(output,"^.+(?=\')")) *12 +
    as.numeric(str_extract(output,"(?<=\').+(?=\"$)"))
  # Calculate inch
}               
                 
df %>%
  mutate(across(c(x,y),~ correction(.)))
#>       x     y
#> 1 71.00 70.00
#> 2 73.00 72.25
#> 3 77.25 76.00

Created on 2021-08-06 by the reprex package (v2.0.0)

Ran K
  • 162
  • 1
  • 5