1

I have a data table whose one column consists of missing cells and character strings like 7 1/4 INCHES, 1/4 INC, 9/16 INCH, 1 3/4 INCHES, 1 13/16 INCHES, 20 INCHES. I want to get rid of whitespaces and INC/INCH/INCHES (string split) and evaluate the rest of the string by converting them to numeric like 7+1/4=7.25.

library(data.table)
data<-data.table(variable = c("", "", "7 1/4 INCHES", "1/4 INC", "9/16 INCH", "1 3/4 INCHES", "", "1 13/16 INCHES", "20 INCHES", "", ""))
#Assigning 0s to empty cells
data$variable[data$variable == "" ] = 0
#Getting rid of INCH, INCHES and INCH
data$variable<-gsub("[[:space:]][A-z]*$", "", data$variable)
#Adding "+" instead of whitespace  (for summation), like 7+1/4 instead of 7 1/4
data$variable<-gsub( " ", "+", data$variable)
data$variable<-eval(parse(text=data$variable))

However, I cannot make eval function to work. Could you please help me about it? Secondly, this particular code does not seem to be a very efficient way to do. I have a very big dataset, and it has 4 columns with a lot of observations, like the small example above. How can I fasten things up a bit?

EDIT:

data$variable<-sapply(data$variable, function(x) eval(parse(text=x)))

I make it work using the line above. However, it is still not an efficient way.

ARAT
  • 884
  • 1
  • 14
  • 35

1 Answers1

2

One way you could do this is to extract each part of the string into separate variables and then use these to calculate the result.

library(tidyverse)

data %>% 
  as_tibble() %>% 
  extract(variable, c("x"), "^(\\d+) ", remove = FALSE) %>%
  extract(variable, c("y", "z"), "(\\d+)/(\\d+)", remove = FALSE) %>%
  mutate_at(vars(x, y, z), as.numeric) %>%
  mutate(result = if_else(is.na(x), 0, x) + if_else(is.na(y / z), 0, y / z)) %>%
  select(variable, result)
#> # A tibble: 11 x 2
#>          variable  result
#>             <chr>   <dbl>
#>  1                 0.0000
#>  2                 0.0000
#>  3   7 1/4 INCHES  7.2500
#>  4        1/4 INC  0.2500
#>  5      9/16 INCH  0.5625
#>  6   1 3/4 INCHES  1.7500
#>  7                 0.0000
#>  8 1 13/16 INCHES  1.8125
#>  9      20 INCHES 20.0000
#> 10                 0.0000
#> 11                 0.0000

This answer also shows several ways to approach this sort of problem

markdly
  • 4,394
  • 2
  • 19
  • 27
  • Thank you very much! I am just focusing on using base packages so I did not check tidyverse. One quick question: how to save this result column? I guess that it is a tibble. I am not very well experienced with that kind of classes. – ARAT Sep 28 '17 at 04:43
  • 1
    @MustafaMuratARAT, change the `data %>%` line to `new_df <- data %>% ` and the results will be stored in `new_df`. To convert back to a standard dataframe you could then do `new_df <- as.data.frame(new_df)`. – markdly Sep 28 '17 at 04:49