-1

I have some data with height strings which are formatted like so

"6'2\"" 

I'm capturing the first digit just fine, but I can't get rid of the

\"

from the end of the string

I've tried several ways of getting at it but nothing has worked yet. Here's where I'm currently at

inches <- str_extract(htString,"(\\d{1,2})[\\\"]?$")

[1] "11"
[1] "3\""

If the inches string is 2 digits long, I'm able to capture the right characters, otherwise, I'm capturing the \"

Thanks for any help!

Edit: Thanks for the help. The following code ended up working for me. It could be cleaned up I'm sure.

for(i in 1:nrow(hs)){
  htString <- hs[i,]$HtRec

  ft <- str_extract(htString, "^(\\d{1,2})[\']?")
  ft <- substring(ft, 1, 1)

  inches <- str_extract(htString,"(\\d{1,2})[\"]?$")
  inches <- str_extract_all(inches, "\\d+")

  ft <- as.numeric(ft)
  inches <- as.numeric(inches)
  htInches <- (ft * 12) + inches
  hs[i,]$HtRec <- htInches
}
zfisher
  • 61
  • 6
  • 2
    The output you see may not be literal. What I mean by this is that in `[1] "3\""` R is itself escaping the double quotes. In fact, there may not even be any backslashes in your data at all. To confirm this, just write the data frame to a text file and check. – Tim Biegeleisen Jul 12 '17 at 01:50
  • I imagine Tim is right. You literally have the text string `6'2"` – thelatemail Jul 12 '17 at 01:57
  • @TimBiegeleisen I don't think that is the case because if i do `as.numeric(inches)` then I get back NAs. edit: Oh, I see what you're saying. I'm not capturing the last `"` correctly then. – zfisher Jul 12 '17 at 01:57
  • 1
    `as.numeric("text")` always gives `NA` - you don't have a number so you get `NA` – thelatemail Jul 12 '17 at 01:58
  • @ZachFisher No...this is expected. Would you expect `as.numeric` with a quote in the input to be castable to a number? The slash is irrelevant; it won't work. – Tim Biegeleisen Jul 12 '17 at 01:59
  • `str_extract_all(x, "\\d+")` is what you want I think. – thelatemail Jul 12 '17 at 02:01
  • @thelatemail that worked thanks! – zfisher Jul 12 '17 at 02:09
  • @ZachFisher - feel free to answer and accept your own question. You get a couple of points of karma and the question gets closed off. Win-win :-) – thelatemail Jul 12 '17 at 02:10

2 Answers2

0

I am not expert in R, but If I try:

(\d+).(\d+)

on https://regex101.com/ with test string: "65'2\"" it seems that I can match both numbers:

Group 1. 1-3 65

Group 2. 4-5 2

This uses the concept of capture groups. It seems that str_match() will help you for working with capture groups, take a look at: Regex group capture in R with multiple capture-groups

Community
  • 1
  • 1
Fabien
  • 4,862
  • 2
  • 19
  • 33
0

Thanks for the help. The following code ended up working for me using @thelatemail's answer. It could be cleaned up for sure.

for(i in 1:nrow(hs)){
  htString <- hs[i,]$HtRec

  ft <- str_extract(htString, "^(\\d{1,2})[\']?")
  ft <- substring(ft, 1, 1)

  inches <- str_extract(htString,"(\\d{1,2})[\"]?$")
  inches <- str_extract_all(inches, "\\d+")

  ft <- as.numeric(ft)
  inches <- as.numeric(inches)
  htInches <- (ft * 12) + inches
  hs[i,]$HtRec <- htInches
}
zfisher
  • 61
  • 6