0

I have this dataset where the parameter and number are strung within a quotations (`). I want to first seperate the whole text from the number. Then clean it up by getting rid of the quotations.

df <- data.frame( test =  c("'+ test1           0.0553933412'", "'<All variables>  0.0553799779'", "'+ test3           0.0009441928'", 
                            "'<none>           0.0000000000'","'+ test2          -0.0012808645'"))

I tried the following. The problem is that the numbers with text1, 2, and 3 are also getting seperated. I also want to get rid of the + and ` from both the columns

EDIT: Thanks to @GregorThomas I was able to seperate them into rows. I also want to get rid of the +, <, > and ` from both the columns

library(tidyr)
df  <- df  %>%
       separate(test, 
       into = c("text", "num"), 
       sep =  c(" {2,}")
       )

enter image description here

  • 1
    In your example data, there are at least 2 spaces separating the real numbers from the rest, and one space everywhere else (like `"+ test1"`). So you could try `sep = " {2,}"` to separate at 2 or more spaces. – Gregor Thomas Mar 08 '23 at 16:39
  • @GregorThomas Thank you! It helped me solve the first part. Any reocmmendations for part2? –  Mar 08 '23 at 16:50
  • [Remove all special characters from string in R](https://stackoverflow.com/q/10294284/903061) – Gregor Thomas Mar 08 '23 at 17:10

2 Answers2

1

Update: Using the fact that you can split the columns where there are at least two spaces, you can rewrite as:

library(tidyverse)

df <- data.frame( test =  c("'+ test1           0.0553933412'", "'<All variables>  0.0553799779'", "'+ test3           0.0009441928'", 
                            "'<none>           0.0000000000'","'+ test2          -0.0012808645'"))

df |>
  separate_wider_delim(
    test,
    delim = stringr::regex("\\s{2,}"),
    names = c("name", "value")
  ) |> 
  mutate(
    value = parse_number(value),
    name = str_trim(str_remove_all(name, "['+<>]"))
  )
#> # A tibble: 5 × 2
#>   name              value
#>   <chr>             <dbl>
#> 1 test1          0.0554  
#> 2 All variables  0.0554  
#> 3 test3          0.000944
#> 4 none           0       
#> 5 test2         -0.00128

Created on 2023-03-08 with reprex v2.0.2

dufei
  • 2,166
  • 1
  • 7
  • 18
0

Alternatively, you can use mutate_all to trim (get rid of the symbols) in all your columns.

In this case, your code would be:

library(tidyverse)
df <- data.frame( test =  c("'+ test1           0.0553933412'",
                            "'<All variables>  0.0553799779'",
                            "'+ test3           0.0009441928'", 
                            "'<none>           0.0000000000'",
                            "'+ test2          -0.0012808645'"))

df  <- df  %>%
  separate(test, 
           into = c("text", "num"), 
           sep =  c(" {2,}")
  ) %>%
mutate_all(~ str_trim(str_remove_all(., "['+<>]")))
Fernando Barbosa
  • 853
  • 1
  • 8
  • 24