0

I have a variable in a R data frame with this format, are are have a mixed format:

98 (56:78)
34 (45:67)
45
67 (34:98)

I want do divide that column into 3 values,and to put each value in a new column of the data frame.

98 56 78
34 45 67
45 NA NA
67 34 98

how can I do that? thank you very much!

3 Answers3

3

With your sample data

dd <- data.frame(a=c("98 (56:78)", "45"))

You can do this using the tidyverse (and the tidyr package in particular) using

library(tidyverse)
dd %>% separate(a, c("x","y","z"), extra="drop")
#    x    y    z
# 1 98   56   78
# 2 45 <NA> <NA>
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you! But there is a problem and I think is bc my variable is considered as a factor, and it gives me the following error: Error in Math.factor(var) : ‘abs’ not meaningful for factors, any idea how to fit it? –  Oct 25 '17 at 19:21
  • It would help if you provided a more complete [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your original question. I've tested the code I wrote and it seems to work just fine. – MrFlick Oct 25 '17 at 19:22
  • Well, I have realized that not all the values in my data are in the same format some are 'a(b:c)' and some are 'a'. –  Oct 25 '17 at 19:31
  • That still works with my code. I don't get the error you describe. You just get a warning about the incomplete split – MrFlick Oct 25 '17 at 19:36
0

Here is a longer base R method that returns a nice looking data.frame.

setNames(do.call(rbind.data.frame,
                 regmatches(temp, regexec("(\\d+) \\((\\d+):(\\d+)\\)$", temp)))[-1],
         paste0("V", 1:3))

Here, regexec selects the desired numbers using the regular expression "(\d+) \((\d+):(\d+)\)$" and regmatches returns a list of length equal to its inputs with each element a length of 4. This list is turned into a data.frame with do.call and rbind.data.frame and the first column, which contains the full string, is removed. Finally, setNames replaces the ugly default names with more manageable ones.

This returns

  V1 V2 V3
1 98 56 78
2 91 53 76
3 99 53 75

Note that the variables are factor variables in this context, which are usually not very useful. To convert these to numeric, you can use the dat[] <- lapply(dat, as.numeric) paradigm.

data

temp <- c("98 (56:78)", "91 (53:76)", "99 (53:75)")
lmo
  • 37,904
  • 9
  • 56
  • 69
0

A tidy solution using extract():

library(tidyverse)

df <- data_frame(a = c("98 (56:78)", "10"))

df %>%
  extract(a, into = c("x", "y", "z"), regex = "(\\d*) ?\\(?(\\d*):?(\\d*)") %>%
  mutate_at(vars(x, y, z), as.integer)

Which results in:

# A tibble: 2 x 3
      x     y     z
  <int> <int> <int>
1    98    56    78
2    10    NA    NA
Stijn
  • 96
  • 5