11

I'm trying to convert, for example, '9¼"'to '9.25' but cannot seem to read the fraction correctly.

Here's the data I'm working with:

library(XML)

url <- paste("http://mockdraftable.com/players/2014/", sep = "")  
combine <- readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F)

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                    "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                    "Cone3", "ShortShuttle20")

As an example, the Hands column in the first row is '9¼"', how would I make combine$Hands become 9.25? Same for all of the other fractions 1/8 - 7/8.

Any help would be appreciated.

Frank B.
  • 1,813
  • 5
  • 24
  • 44
  • possible duplicate of [Convert a character vector of mixed numbers, fractions, and integers to numeric](http://stackoverflow.com/questions/10674992/convert-a-character-vector-of-mixed-numbers-fractions-and-integers-to-numeric) – Metrics Feb 22 '15 at 21:42
  • 5
    @Metrics -- Doesn't seem to be a duplicate to me, as the fractions at the linked URL are apparently encoded as individual characters (possibly in Unicode like, e.g., [these](http://symbolcodes.tlt.psu.edu/bylanguage/mathchart.html#fractions)). – Josh O'Brien Feb 22 '15 at 21:48
  • Well, if they *are* unicode fractions, then a simple lookup table to map the unicode's integer value to the desired numeric value is trivial to produce. – Carl Witthoft Feb 22 '15 at 21:55

2 Answers2

8

You can try to transform the unicode encoding to ASCII directly when reading the XML using a special return function:

library(stringi)
readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
        val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

You can then use @Metrics' suggestion to convert it to numbers.

You could do for example, using @G. Grothendieck's function from this post clean up the Arms data:

library(XML)
library(stringi)
library(gsubfn)
#the calc function is by @G. Grothendieck
calc <- function(s) {
        x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
        x[1] + x[2] / x[3]
}

url <- paste("http://mockdraftable.com/players/2014/", sep = "")  

combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
        val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                    "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                    "Cone3", "ShortShuttle20")

sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)

#[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875

There might be some encoding issues depending on your machine (see the comments)

Community
  • 1
  • 1
NicE
  • 21,165
  • 3
  • 51
  • 68
  • 1
    That's interesting, but (at least on my Windows 7 computer) doesn't read in all of the fractions correctly. Travis Carrie (for instance), the 5th player down, has arms that are 31 7/8", but that gets read in as `31a...z"`. Looks like maybe 1/4, 1/2, and 3/4 get correctly translated, but not fractions that are odd multiples of 1/8. – Josh O'Brien Feb 22 '15 at 22:07
  • 1
    Strange, I'm on MacOS and 1/8 gets converted fine, maybe there is another function of `stri` that could be of use here, thanks for adding the `library` – NicE Feb 22 '15 at 22:09
  • Thought it might be an OS issue. I've never had occasion (or reason) to really figure out encodings on my Windows machine. I just notice whenever I try something with them, that they don't seem to be handled particularly well... – Josh O'Brien Feb 22 '15 at 22:12
  • @NicE that worked. Obviously the issue is encoding. Thank you all. – Frank B. Feb 23 '15 at 00:09
1

I don't think this is clever or efficient compared to alternatives, but this uses gsub to replace the " symbol and convert each fraction to its decimal, before converting to numeric:

#data (I've not downloaded XML for this, so maybe the encoding will make a difference?)
combine = data.frame(Hands = c('1"','1⅛"','1¼"','1⅜"','1½"','1⅝"','1¾"','1⅞"'))

#remove the "
combine$Hands = gsub('"', '', combine$Hands)

#replace each fraction with its decimal form
combine$Hands = gsub("⅛", ".125", combine$Hands)
combine$Hands = gsub("¼", ".25", combine$Hands)
combine$Hands = gsub("⅜", ".375", combine$Hands)
combine$Hands = gsub("½", ".5", combine$Hands)
combine$Hands = gsub("⅝", ".625", combine$Hands)
combine$Hands = gsub("¾", ".75", combine$Hands)
combine$Hands = gsub("⅞", ".875", combine$Hands)


combine$Hands <- as.numeric(combine$Hands)
ping
  • 1,316
  • 11
  • 14