3

I have a data frame, df that looks something like this:

    date      sample
1 29-Apr 1,000 (1/4)
2 29-Apr 1,000 (1/4) 
3 28-Apr 1,970       
4 27-Apr 1,000 (1/4) 
5 25-Apr 1,000 (1/4)
...

How can I extract the value in parenthesis and create a new column from it?

I can extract the values in parenthesis:

matches <- regexpr("\\(.*?\\)", df$Sample_Size)
fractions_with_parens <- regmatches(df$Sample_Size, matches)
fractions <- gsub("[\\(\\)]", "", more)

But this will remove the non-matches, so the vector does match the length of the dataframe's rows. So in this example row 3 will be missing.

Community
  • 1
  • 1
43Tesseracts
  • 4,617
  • 8
  • 48
  • 94

3 Answers3

3

You can use dplyr:

library(stringr)
library(dplyr)
df <- data.frame(date = c('29-Apr', '29-Apr', '28-Apr', '27-Apr', '25-Apr'),
                 sample = c('1,000 (1/4)', '1,000 (1/4)', '1,970', 
                            '1,000 (1/4)', '1,000 (1/4)'))

df %>% mutate(new = str_match(sample, pattern = '\\d+/\\d+'))

Resulting in:

    date      sample  new
1 29-Apr 1,000 (1/4)  1/4
2 29-Apr 1,000 (1/4)  1/4
3 28-Apr       1,970 <NA>
4 27-Apr 1,000 (1/4)  1/4
5 25-Apr 1,000 (1/4)  1/4
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
  • R base is well tailored for this problem. Anyways, abstracting from this simplistic use case and go to more complex applications this might be a suitable alternative. – Martin Schmelzer Apr 29 '17 at 23:09
3

You could try stringr:

library(stringr)
df$extract <- str_extract(df$sample, "\\(.*?\\)")

df
#    date      sample extract
#1 29-Apr 1,000 (1/4)   (1/4)
#2 29-Apr 1,000 (1/4)   (1/4)
#3 28-Apr       1,970    <NA>
#4 27-Apr 1,000 (1/4)   (1/4)
#5 25-Apr 1,000 (1/4)   (1/4)

To extract values within parenthesis you could do:

df$extract <- str_extract(df$sample, "(?<=\\().*(?=\\))")

Thanks to epi99 for the suggestion.

Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • 1
    To extract from within the paretheses use a regex with look ahead and look behind like `str_extract("ab(123)fg", "(?<=\\().*(?=\\))")` – Andrew Lavers Apr 29 '17 at 22:55
  • Thanks@epi99! Initially thought he wanted to keep the parenthesis, but i'll update to your answer to extract values within. – Mike H. Apr 29 '17 at 22:59
1

We can do this with convenient functions from qdapRegex

library(qdapRegex)
df$new <-unlist(ex_round(df$sample, include.markers=TRUE))
df$new
#[1] "(1/4)" "(1/4)" NA      "(1/4)" "(1/4)"

If we don't require the brackets, remove the include.markers

df$new <-unlist(ex_round(df$sample))
df$new
#[1] "1/4" "1/4" NA    "1/4" "1/4"
akrun
  • 874,273
  • 37
  • 540
  • 662